SQL - The fastest query for multiple INNERJOIN [closed] - mysql

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Which query is faster - (or it doesn't matter)?
SELECT *
FROM students as s
INNER JOIN hallprefs as hp
ON s.studentid = hp.studentid
INNER JOIN halls as h
ON hp.hallid = h.hallid
or
SELECT *
FROM students as s
INNER JOIN hallprefs as hp
INNER JOIN halls as h
ON hp.hallid = h.hallid
AND s.studentid = hp.studentid
Of course the original question is with way more tables.

The comments have all alluded to the same points, that it shouldn't matter in terms of performance, and that the second query is not ANSI compliant. The reason MySQL allows it is because
In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise.
extracted from online documentation
So the ANSI equivalent of the second query is:
SELECT *
FROM students as s
CROSS JOIN hallprefs as hp
INNER JOIN halls as h
ON hp.hallid = h.hallid
AND s.studentid = hp.studentid;
Again, this rewrite should have no impact on performance, SQL is a declarative language, so you tell the engine what you want it to do, not how you want it to do it, so since the intention of the two queries is exactly the same, one would hope the optmiser arrives at the same plan for both. This is of course not always the case, although I am pretty certain it will be for all DBMS for simple cases like this.
When it comes to SQL, the answer to which is fastest/more efficient etc is almost always, it depends. It will depend on your schema, indexes, data types, data distribution, database vendor/version. So while general guidelines can be give, the real answer is to test.
As to which is better practice, I think it really depends on your intentions, The problem with the former is that you might decide you only want to left join on halls, so adapt your queries:
SELECT *
FROM students as s
CROSS JOIN hallprefs as hp
LEFT JOIN halls as h
ON hp.hallid = h.hallid
AND s.studentid = hp.studentid;
You introduce a Cartesian product, whereas the same change with the first query doesn't do this.
SELECT *
FROM students as s
INNER JOIN hallprefs as hp
ON s.studentid = hp.studentid
LEFT JOIN halls as h
ON hp.hallid = h.hallid;
Now, the intention could have been to have the Cartesian product, in which case the cross join solution is better for this situation. Once again, it depends, and your mileage may vary.

Related

Is there any difference, performance wise, with these two queries? (Repeating the where clause inside the sub-query) MYSQL

I have a query that goes something like this.
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
As you can see, in the outside query, there is a where clause in the outside main query.
But also, on the inside, we have an Inner Join statement with the line SELECT INNER_E.* FROM Equipment INNER_E. This inner join makes us only retrieve the fault codes that are inside the equipment table (correct me if I'm wrong).
I am trying to optimize this query.
My question is, does it make any difference to do this
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
WHERE INNER_E.id_organization = 100057 AND INNER_E.equipment_status = 'ACTIVE'
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
So repeating the where clause inside the inner sub query, to further limit it before it joins. Or does the optimizer know to do this automatically?
I tried implementing that line in code, and it seemed to only make my query slower strangely enough. Is there any way I can optimize that query above, or since it's pretty simple, is that the best it's going to get without indexes?
I tried running the Explain Select statement, but I have a hard time parsing what it's telling me. Are there any good resources I can look into to learn some tips or techniques to optimize my query?
I don't have any aggregate functions in my Select fields. So is the only real answer Indexes?
Why is the first subquery needed? Perhaps simply
Select *
FROM FaultCode FC
JOIN Equipment AS E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type
AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057
AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN';
Likely Indexes:
FC: INDEX(code_status, EquipmentID)
E: INDEX(id_organization, equipment_status, EquipmentID,)
Probably unwise to do SELECT * -- It will give you all the columns of all 4 tables. (Without further details, I cannot suggest any "covering" indexes, which seems likely for AT.)
With my version of the query, your question about repeating the WHERE vanishes. With your version, it is likely to help. I don't think the Optimizer is smart enough to catch on to what you are doing.
Show us the EXPLAINs. We can help some with what the cryptic stuff is saying. (And what it is not saying.)
"the best it's going to get without indexes" -- Are you saying you have no indexes??! Not even a PRIMARY KEY for each table? "So is the only real answer Indexes?" Every time you write a query against a non-tiny table, you should ask "do the table(s) have adequate indexes for this query?"

Select some data even if a condition isn't me

I have been trying to figure this out for a while -- I'm working on a simulation (run in PHP because I hate myself). I've gotten the thing up to the point where I can start adding in "Viruses" and such.
Right now, I'm working on an 'virus' that limits fertility of citizens -- I've got it all working perfectly save for the actual method that finds out if they are 'infertile' or not.
I'm trying to pull data from the database using a query similar to this (note this query semi-working, but I can't figure out how to make it work properly):
SELECT g.infert1 as gert,
v.infert1 as vert
FROM genetics as g,
virus as v,
citizens as c
WHERE c.cid = 1
AND g.cid = c.cid
AND v.vid = c.infected
The GERT and VERT data associate to two tables (Genetics and Virus). The virus table contains most of the same rows as the genetics table (Imm00-12, infert1, etc.). Both sets of data select correctly if a 'citizen' is infected, however if they are not, then the result returns null and causes an error.
I'm trying to figure out if there is a way to get conditional data where if a citizen is not infected, it'll still select the GERT information and just return null for VERT as opposed returning nothing at all.
Please try the following...
SELECT genetics.infert1 AS gert,
virus.infert1 AS vert
FROM citizens
JOIN genetics ON citizens.cid = genetics.cid
LEFT JOIN virus ON virus.vid = citizens.infected
WHERE citizens.cid = 1;
This statement starts by performing an INNER JOIN between citizens and genetics based on their common value of cid. This effectively prepends the fields of citizen to those of genetics for each citizen.
It then performs a LEFT JOIN between this joined dataset and virus where virus.vid = citizens.infected. This has the effect of appending the values of the fields of a virus that a citizen is afflicted with along with the fields for that citizen to each record from genetics for that citizen. If the citizen is not afflicted with a virus, then NULL values are used for the virus fields.
The resulting dataset is then refined to just those records where the value of citizens.cid is 1.
If you wish to produce the same list but for all citizens, then remove the WHERE clause.
Using INNER JOIN and LEFT JOIN and is a more modern and efficient way of joining than using CROSS JOIN's refined by the WHERE clause. A CROSS JOIN between two tables, performed using tblTable1, tblTable2 appends a copy of every record from tblTable2 to every record from tblTable. The resulting (often very large) dataset is then refined by the WHERE clause. With INNER JOIN and LEFT JOIN only those record that meet the ON criteria are joined, which is typically significantly faster.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Something like this is probably what you want.
SELECT g.infert1 as gert,
cv.vert as vert
FROM
genetics as g
LEFT JOIN
( SELECT c.cid as cid, v.infert1 as vert
FROM virus as v
INNER JOIN citizens as c ON v.vid = c.infected
WHERE c.cid = 1
) AS cv
ON g.cid = cv.cid
Here I use SQL JOIN syntax:
Inner Join means give me rows where they exactly match. Which is what your orignal SQL does, but we want that only on virus vs citizens
LEFT JOIN means "give me rows from the LEFT hand table, and if there are any in the other, then include them too" - which is what I think you want.
You may wish to re-evaluate whether citizens can only have one virus though.

SQL - Query Assistance [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I'm pretty new to SQL and I am working on an assignment that requires me to find what movies were created by a company with "films" in its name based off the IMDB database.
A diagram of this database can be viewed here:
http://i.imgur.com/kj8qVgF.png
This is the query I was working on.
SELECT t.title, t.id
FROM title t, movie_link m
JOIN movie_companies c ON (m.movie_id = c.movie_id)
JOIN company_name n USING (id)
WHERE n.name LIKE '%films%'
Your code does seem to be somewhat close to what you'll need, but there are some weird things there. First of all, the implicit join here title t, movie_link m is performing a cartesian product between those tables. This code should work:
SELECT DISTINCT t.id, t.title
FROM title t
INNER JOIN movie_companies mc
ON t.id = mc.movie_id
INNER JOIN company_name cn
ON mc.company_id = cn.id
WHERE cn.name LIKE '%films%'
There are several things that you need to take into account from that diagram to get the results that you want. For instance, since different companies can be the creator of the same movie (the diagram allows it), you'll need to use DISTINCT otherwise you might get duplicate results.
Seems like you are missing a join predicate for the join between title and movie_link tables.
I recommend you NOT mix the "comma" and the "JOIN keyword" syntax in a statement. That is, remove that comma between title and movie_link) and replace it with the keyword JOIN. And add a join predicate, if you're not wanting a cross join.
Also, the USING syntax in the join predicate is a little ambiguous for my taste. I recommend you explicitly specify which columns from which tables you want to match, like you did for the join to movie_companies table.
The query below isn't runnable. The ??? need to be replaced with appropriate identifiers and aliases:
SELECT t.title
, t.id
FROM title t
JOIN movie_link m ON m.??? = t.???
JOIN movie_companies c ON c.movie_id = m.movie_id
JOIN company_name n ON n.id = ???.???
WHERE n.name LIKE '%films%'

Position of `INNER JOIN` filtering conditions in a query; `ON` or `WHERE` clause [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
There is one answer on this question that touches on this.. but I feel it deserves a question of it's own.
This question, which is marked as a duplicate to the first but isn't really, is what I want to ask.. and as it says in the bit:
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
So I'm asking a new question.
I can write a query as:
SELECT *
FROM customer_order co
JOIN customer c
ON c.id = co.customer_id
AND c.type = 'special'
JOIN order o
ON o.id = co.order_id
AND o.status = 'dispatched'
OR:
SELECT *
FROM customer_order co
JOIN customer c
ON c.id = co.customer_id
JOIN order o
ON o.id = co.order_id
WHERE c.type = 'special'
AND o.status = 'dispatched'
I absolutely prefer the first way, especially in more complex queries as it groups the conditions with the tables on which they operate, which makes it easier for me to read and to identify appropriate composite indexes. It also means that if I want to change to a LEFT JOIN (or maybe RIGHT JOIN, I don't really use RIGHT JOIN), all the conditions are in the right place.
There seems to be some preference, however, in the community towards the second way.
Does anybody know if this preference is grounded, perhaps in some performance issue or in some readability issue that I have yet to stumble across? Or can I continue to be a rebel happily?
They are both exactly the same. The only deciding factor is what standards you use in your project. You need to decide what is more readable for you and go with that. For example the way you format your queries is not what I would do.
I would do
SELECT
*
FROM
customer_order co
INNER JOIN customer c ON
c.id = co.customer_id AND
c.type = 'special'
INNER JOIN order o ON
o.id = co.order_id AND
o.status = 'dispatched'
There is no difference between mine and yours except that I feel mine is more readable. As a rule of thumb I generally reserve the where clause for statements that relate to the base table. Also the first column in the inner join would be related to the table being joined (e.g. o.id or c.id). These are all things I use to just keep consistency. Another developer might prefer to have all conditionals in the where clause. It's simply preference
Regarding your thoughts on the community, I think most people would agree that consistency is key. Make sure you document your methodology for other developers and go with that. If performance was being affected this would be a different discussion, but it's not.
Carry on what you are doing, but make sure it's consistent!
Also, for questions like this I think the code review forum is a better place and guys will be less likely to vote your question down.
In the case of inner join both really are equivalent in its execution, even though there is a different semantics. Query optimizers will review and evaluate criteria in your WHERE clause and your FROM clause and consider all of these factors when building query plans in order to reach the most efficient execution plan. So you can go with whatever way you like to.
Also as you told it is worth notable that when inner join is replaced with left/ right joins equations change and you need the filters on 'ON' clause.

Smart way of writing sql Join [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I was exploring someone's code and I have found this query.This query is fetching data using join. But there is no use of inner join, outer join, left or right. The programmer simply writes this query.
I found this is the SMART way of writing query, if this query is right ?
SELECT
a.*,b.*,c.*,d.*,e.*
FROM
property_photos a,property_promotions b,properties c, communities d,cities e
WHERE
a.property_id = b.property_id AND
a.property_id = c.id AND
(b.promotion_id =2 OR b.promotion_id =1) AND
a.mainphoto="Y" AND
d.id=c.community_id AND
e.id = c.city_id AND
a.featured_status = 1
Is this query correct?
It this query is right, the i think, this is more better way to write queries since it will save a lot of disk space instead of use Left join/right Join like words which makes query long
SMARTY WAY
Yes it is alright. It is the same as an INNER JOIN.
Note:
Which Syntax to Use? Per the ANSI SQL specification, use of the INNER
JOIN syntax is preferable. Furthermore, although using the WHERE
clause to define joins is indeed simpler, using explicit join syntax
ensures that you will never forget the join condition, and it can
affect performance, too (in some cases).
Source: Sam's - MySQL Crash Course, page 139
Yes, this is known as implicit JOIN syntax. It has fallen out of favour once ANSI-92 defined an explicit JOIN syntax, which is an order of magnitude easier to read as your queries get more complex.
From Wikipedia:
SQL specifies two different syntactical ways to express joins: "explicit join notation" and "implicit join notation".
The "explicit join notation" uses the JOIN keyword to specify the table to join, and the ON keyword to specify the predicates for the join, as in the following example:
SELECT *
FROM employee
INNER JOIN department ON employee.DepartmentID = department.DepartmentID;
The "implicit join notation" simply lists the tables for joining, in the FROM clause of the SELECT statement, using commas to separate them. Thus it specifies a cross join, and the WHERE clause may apply additional filter-predicates (which function comparably to the join-predicates in the explicit notation).
The following example is equivalent to the previous one, but this time using implicit join notation:
SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;