Consider two table
tableA and tableB
tableA
|id|driver_id|vehicle_id|is_allowed|license_number|driver_name|
tableB
|id|driver_id|vehicle_id|offence|payable_amount|driver_name|
Goal: find driver_id and vehicle_id of allowed driver whose name is XYZ.
Query1:SELECT * FROM tableA,tableB {join-condition}{filter-condition}
SELECT tableA.driver_id,tableA.vehicle_id FROM tableA,tableB
WHERE
tableA.driver_id=tableB.driver_id AND
tableA.vehicle_id=tableB.vehicle_id AND
tableA.driver_name='XYZ' AND
tableB.driver_name='XYZ' AND
tableA.is_allowed = 1
Query2:SELECT * FROM (SELECT * FROM tableA {filter-condition}) JOIN (SELECT * FROM tableB {filter-condition}) ON {join-condition}{filter-condition}
SELECT tableAA.driver_id,tableAA.vehicle_id FROM
(SELECT tableA.driver_id,tableA.vehicle_id from tableA WHERE tableA.driver_name='XYZ' AND
tableA.is_allowed = 1) as tableAA,
JOIN
(SELECT tableB.driver_id,tableB.vehicle_id from tableB WHERE tableB.driver_name='XYZ') as tableBB
ON
tableAA.driver_id=tableBB.driver_id AND
tableAA.vehicle_id=tableBB.vehicle_id
which type of query is readable, optimized and according to standard.
A correct version would look like this:
SELECT a.driver_id, a.vehicle_id
FROM tableA a JOIN
tableB b
ON a.driver_id = b.driver_id AND
a.vehicle_id = b.vehicle_id
WHERE a.driver_name = 'XYZ' AND
b.driver_name = 'XYZ' AND
a.is_allowed = 1;
Notes:
JOIN is accepted as the right way to combine tables in the FROM clause. Simple rule: Never use commas in the FROM clause.
The ON clause should contain all predicates that contain columns from more than one table.
The use of table aliases is a preference that makes queries easier to write and to read.
You might want to use IN or EXISTS, because your query is not returning columns from TableB.
Do not use unnecessary subqueries in the FROM clause. In some databases (notably MySQL), this impedes the use of indexes and adds additional overhead for materialization of the intermediate table.
And, the answer to your question is that the first version is probably the optimized version (because it does not materialize subqueries unnecessarily). Neither version is preferred.
First one is better in case of standard and performance but is very old fashioned so it can be written in this way
SELECT tableA.driver_id,tableA.vehicle_id
FROM tableA
INNER JOIN tableB ON tableA.driver_id=tableB.driver_id
AND tableA.vehicle_id=tableB.vehicle_id
AND tableA.driver_name='XYZ'
AND tableB.driver_name='XYZ'
AND tableA.is_allowed = 1
Related
I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.
I am not sure if this is possible. But is it possible to do a join on 2 tables, but return the data for only one of the tables. I want to join the two tables based on a condition, but I only want the data for one of the tables. Is this possible with SQL, if so how? After reading the docs, it seems that when you do a join you get the data for both tables. Thanks for any help!
You get data from both tables because join is based on "Cartesian Product" + "Selection". But after the join, you can do a "Projection" with desired columns.
SQL has an easy syntax for this:
Select t1.* --taking data just from one table
from one_table t1
inner join other_table t2
on t1.pk = t2.fk
You can chose the table through the alias: t1.* or t2.*. The symbol * means "all fields".
Also you can include where clause, order by or other join types like outer join or cross join.
A typical SQL query has multiple clauses.
The SELECT clause mentions the columns you want in your result set.
The FROM clause, which includes JOIN operations, mentions the tables from which you want to retrieve those columns.
The WHERE clause filters the result set.
The ORDER BY clause specifies the order in which the rows in your result set are presented.
There are a few other clauses like GROUP BY and LIMIT. You can read about those.
To do what you ask, select the columns you want, then mention the tables you want. Something like this.
SELECT t1.id, t1.name, t1.address
FROM t1
JOIN t2 ON t2.t1_id = t1.id
This gives you data from t1 from rows that match t2.
Pro tip: Avoid the use of SELECT *. Instead, mention the columns you want.
This would typically be done using exists (or in) if you prefer:
select t1.*
from table1 t1
where exists (select 1 from table2 t2 on t2.x = t1.y);
Although you can use join, it runs the risk of multiplying the number of rows in the result set -- if there are duplicate matches in table2. There is no danger of such duplicates using exists (or in). I also find the logic to be more natural.
If you join on 2 tables.
You can use SELECT to select the data you want
If you want to get a table of data, you can do this,just select one table date
SELECT b.title
FROM blog b
JOIN type t ON b.type_id=t.id;
If you want to get the data from two tables, you can do this,select two table date.
SELECT b.title,t.type_name
FROM blog b
JOIN type t ON b.type_id=t.id;
I have a left join on a MySQL database like this...
select *
from tableA
left join tableB on (tableA.id=tableB.id1 and tableB.col2='Red')
...which performs OK with >500K rows on tableA and tableB
However, changing it to this (and assuming indexes are OK)...
select *
from tableA
left join tableB on
((tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue') )
...kills it, in terms of performance.
So why the performance hit? Can I do it another way?
EDIT
Not really sure what do you need
can you show some expected result
can you tell us what you mean by "kills it, in terms of performance" (does it go to 20sec of execution time ?)
I don't believe its more efficient but try it.
select
*
from
tableA as a
left join tableB as b1
on a.id=b1.id1
and b1.col2='Red'
left join tableB as b2
on a.id=b2.id2
and b2.col2='Blue'
where
(b1.id1 is not null or b2.id2 is not null)
or (b1.id1 is null and b2.id2 is null)
You have to manage the result in the SELECT with CASE WHEN...
You can compare the performance and put indexes on appropriated columns (depends on what you have in full table and query but here it should be id, id1 and col2)
Oh didn't notice that ... how about this ... don't have execution so you'll be able to judge better.
select *
from tableB, tableA
where (tableA.id=tableB.id1 and tableB.col2='Red') OR
(tableA.id=tableB.id2 and tableB.col2='Blue')
Got this bit of SQL as an update script, I've tried to add a work round to not being able to include the table to be updated as a clause in the statement so using sub queries, but struggling to get this to work.
Essientially I need update a vailue in table 1 with the summation of a field in table 2, but only where the two other fields match across a couple of tables and based on field6 the restriction is applied to the update clause.
UPDATE table1 W SET Field1=(SELECT field2 FROM
(SELECT A.id, B.field3, SUM(A.field2) AS field2
FROM table2 A, table3 B, table4 P
WHERE A.id=B.id AND P.field6=B.field6) B ) WHERE W.field6=B.field6
In the real world example, select the sum of points conceded in a rugby game when a rugby player has participated in the match. table 2 has the results (including the score) table 3 has the team sheets and table 1 and 4 are the same player table to be updated.
Hopefully this is clear enough and someone can point me in the right direction.
Tried the following:
UPDATE $WSLKEEP W, $WSLFIX A, $WSLFIXPLAY B
SET W.F_CONCEDED=SUM(A.F_AGAINST)
WHERE A.F_ID=B.F_GAMEID
AND B.F_NAME=W.F_NAME"
but now stuck with:
Invalid use of group function
Kind regards
It seems like your subquery should be grouping on field6 and exposing that column for inner join with table1. Here's how you do that in MySQL:
UPDATE table1 W
INNER JOIN (
SELECT B.field6, SUM(A.field2) AS field2
FROM table2 A, table3 B, table4 P
WHERE A.id=B.id AND P.field6=B.field6
GROUP BY B.field6
) B ON W.field6=B.field6
SET W.Field1 = B.Field2
And while we are at it, I would also recommend you to refrain from (ab)using comma joins in favour of explicit joins. The latter, however unusual at first after being long accustomed to a different syntax, can very soon become habitual and much more intuitive than the former. A great deal has been said on the topic, and some people may be holding quite strong opinions about comma joins. I say, comma joins can still have their share of use. However, when you are joining on a condition, the current ANSI syntax should be your choice.
Here's the above statement with the subquery transformed so as to use explicit joins:
UPDATE table1 W
INNER JOIN (
SELECT B.field6, SUM(A.field2) AS field2
FROM table2 A
INNER JOIN table3 B ON A.id = B.id
INNER JOIN table4 P ON P.field6 = B.field6
GROUP BY B.field6
) B ON W.field6 = B.field6
SET W.Field1 = B.Field2
For an update query like you have above, you are allowed to include multiple tables in the UPDATE clause, even if you aren't updating all of them. This will make sub-queries unnecessary and speed the execution quite a bit. For example, you can do something like this.
UPDATE table1 W, table2 A, table3 B, table4 P
SET W.Field1 = SUM(A.field2) ...
I'm unclear on the specifics of what you are trying to update exactly, but I just wanted to put out that you can often avoid sub-queries by using this kind of syntax.
Can anyone explain to me why they use AS id and AS r2 and USING id on the codes :
SELECT id, username FROM friendship JOIN (SELECT CEIL(RAND() * (SELECT MAX(id) FROM friendship)) AS id) AS r2 USING (id);
I just want to know the purpose of using them, why using them?
The As is just to create an Alias for the subquery, that way you can use that same alias 'id' with the using command.
Basically the query wants the friendship table to join the subquery on the id column, since the subquery only returns one result it can join with the id column in the friendship table.
OscarMk already explained the AS, but the USING is the equivalent of ON table1.id = table2.id. By using USING (a bit redundant here...), your RDBMS basically merge the columns into a single one, which means you don't have to tell your RDBMS if you want the column from table1 or table2. Simple example :
SELECT table1.id
FROM table1
INNER JOIN table2
ON table1.id = table2.id;
SELECT id
FROM table1
INNER JOIN table2
USING (id);
Those queries are equivalent. If you had forgotten to use table1.id in the first query, your RDBMS would have raised an error.
You can usually do a NATURAL JOIN instead of a INNER JOIN, and not use a USING or ON clause at all (that's what NATURAL JOIN are for)