How the SQL join actually works? - mysql

Suppose I have two table Gardners table and Plantings table.
Suppose my query is:
SELECT gid, first_name, last_name, pid, gardener_id, plant_name
FROM Gardners
INNER JOIN Plantings
ON gid = gardener_id
I want to know how exactly it works internally?
As per my understanding in every join condition:
Each row from Gardner Table will be compared with each row of Plantings Table. If the condition is matched then it will print out. Is my understanding correct?
In terms of program if you think:
for i in [Gardners Table]:
for j in [Plantings Table]:
if i.id == j.garderner id:
print <>
Now suppose if you query is something like:
User(uid,uname,ucity,utimedat)
Follows(uid,followerid) // uid and followerid are pointing to `uid` of `User`.
SELECT U.uid, U.uname FROM User U1 JOIN Follows F,User U2 WHERE
F.followerid = U2.uiddate AND U2.ucity = 'D'
How the join condition will work internally here? Is it equivalent to:
for i in [USER]:
for j in [Follows]:
for k in [USER]:
if condition:
print <>

Your example with Gardners table and Plantings table is correct. But example with users not so obvious.
I think that what you want to get is user followers from some city.
Assuming correct query is:
SELECT U1.uid, U2.uname
FROM User U1
JOIN Follows F ON U1.uid = F.uid
JOIN User U2 ON F.followerid = U2.uid
WHERE U2.ucity = 'D'
Then in pseudo code it'll look like this:
for i in [User Table]:
for j in [Follows Table]:
if i.uid = j.uid:
for k in [User Table]:
if j.followerid = k.uid and k.city = 'D':
print <>
SQL Fiddle for this: http://sqlfiddle.com/#!9/caeb1e/5
There is a very good picture of how joins actually works can be found here: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

In your second query, it's not clear what you're trying to do exactly as the syntax is erroneous; but if I were to guess, it seems like your intention is to join User U1 with a sub query of (implicit) join between Followers F and User U2.
If my guess is correct, the query would properly look more like this:
SELECT U1.uid, U1.uname
FROM User U1 JOIN
(SELECT U2.uid
FROM Followers F,User U2
WHERE F.followerid = U2.uiddate AND U2.ucity = 'D') T
WHERE u1.uid = T.uid
Which is not a 'best practice' way of writing the query either (you should use explicit joins, there's no need for a sub-query but you can just join three times, and so on)
But I wrote it this way to keep it closest to your original query.
And if my guess is correct, then your pseudo code would be more like:
for u2 in [User 2 where condition]:
for f in [Follows]:
if f.uid == u2.uid
SELECT uid AS T
for u1 in [User 1]:
if u1.uid == T.uid:
print <>
However, it's not a fully explained interpretation, because one key to understanding SQL is to think more in 'set' of data being filtered, rather than sequential selection of objects of data, because SQL does operations based on the set of data, which one might not be used to.
So a number of the above steps will be executed in one go, instead of sequential. But other than that, you should look towards the answer given by Yuri Tkachenko above, in how to view joins - and then the internals will come second when writing correct joins.

Yes you're understanding is correct if you are only talking on a join not on the other join eg: Inner, Outer like in SQL

Related

how do i join third table values into main join?

in query here i have https://www.db-fiddle.com/f/32Kc3QisUEwmSM8EmULpgd/1
SELECT p.prank, d.dare
FROM dares d
INNER JOIN pranks p ON p.id = d.prank_id
WHERE d.condo_id = 1;
i have one condo with id 1 and it have unique connection to dares that has connection to pranks and unique connection to condos_pranks
and i wanna have all unique pranks from both tables and i used this query above to get relation of
dares to pranks and expected result was L,M,N - Yes,No,Maybe and it is correct but i also wanna have those in condos_pranks which ids are 1,4,5,6 = L,O,P,Q
so i tried to join the table with left join because it might not have condos_pranks row
SELECT p.prank, d.dare
FROM dares d
INNER JOIN pranks p ON p.id = d.prank_id
LEFT JOIN condos_pranks pd ON pd.condo_id = d.condo_id AND pd.prank_id = p.id
WHERE d.condo_id = 1;
but result is same as first and what i want is
prank
dare
L
Yes
M
No
N
Maybe
O
No
P
No
Q
No
with default being No = 2 if prank_id of condos_pranks is not in dares
how to connect it?
This seems like an exercise in identifying extraneous information more than anything. You are unable to join something to a table that has no key, however if you know your default then you may use something like coalesce to identify the records where there was no data to join NULL and replace them with your default.
I mentioned in a comment above that this table schema makes little sense. You have keys all over the place that doing have all sorts of circular references. If this is your derived schema, consider stopping here and revisiting the relationships. If it is not and it is something educational, which I suspect it is, disregard and recognize the logical flaws in what you are working in. Perhaps consider taking the data provided and creating a new table schema that is more normalized and uses other tables to handle the many to many and one to many relationships.
dbfiddle
SELECT
pranks.prank,
COALESCE(dares.dare, 'No')
FROM pranks LEFT OUTER JOIN
dares ON pranks.id = dares.prank_id
ORDER BY pranks.prank ASC;
clearlyclueless gave correct explanations
To achieve the result, the following SELECT can also be used:
SELECT
pranks.prank,
case
when dare is null then 'No'
else dare
end
FROM pranks LEFT OUTER JOIN
dares ON pranks.id = dares.prank_id

SQL JOIN query needs over 15s to run

I have a pretty big SQL query to get data from multiple database tables. I use the ON condition to check if the guild_ids are always the same and in some cases, he check's for an user_id too.
That is my query:
SELECT
SUM( f.guild_id = 787672220503244800 AND f.winner_id LIKE '%841827102331240468%' ) AS guild_winner,
SUM( f.winner_id LIKE '%841827102331240468%' ) AS win_sum,
m.message_count,
r.bypass_role_id,
i.real_count,
i.total_count,
i.bonus_count,
i.left_count
FROM
guild_finished_giveaways AS f
JOIN guild_message_count AS m
JOIN guild_role_settings AS r
JOIN guild_invite_count AS i ON m.guild_id = f.guild_id
AND m.user_id = 841827102331240468
AND r.guild_id = f.guild_id
AND i.guild_id = f.guild_id
AND i.user_id = m.user_id
But it runs pretty slow, with over 15s. I can't see why it needs so long.
I figured out that if I remove the "guild_invite_count" JOIN, it's pretty fast again. Do I have some simple error here that I don't see? Or what could be the issue?
Each JOIN expression needs it's own ON. Don't wait until the end for this. As it was, the server was forced to build up a cartesian product of all those tables before narrowing them down again, and I'm surprised the query ran at all (I'd expect a syntax error for missing ON clauses).
FROM guild_finished_giveaways AS f
JOIN guild_message_count AS m ON m.guild_id = f.guild_id
JOIN guild_role_settings AS r ON r.guild_id = f.guild_id
JOIN guild_invite_count AS i ON i.guild_id = f.guild_id
AND i.user_id = m.user_id
WHERE m.user_id = 841827102331240468
It's also more than a little odd to use SUM() or any other aggregate function in the same query as non-aggregated values without a GROUP BY clause.
Are you using InnoDB?
Does every table have a PRIMARY KEY?
These may help:
m: PRIMARY KEY(user_id) -- assuming that is unique in that table
f: INDEX(guild_id, winner_id)
r: INDEX(guild_id, bypass_role_id)
i: INDEX(user_id,)
It looks like some tables should not be separate -- perhaps r,i,f could be combined? (I need to see SHOW CREATE TABLE to say more.)
Do NOT have a commalist in winner_id. Instead have another table with one row per winner per game (or whatever it is a winner of). Perhaps just to columns like a Many-to-many mapping table.
Noting that the execution is likely to start with m and then go next to i let's improve on Joel's suggestion:
FROM guild_message_count AS m
JOIN guild_invite_count AS i ON i.user_id = m.user_id
JOIN guild_finished_giveaways AS f ON f.guild_id = m.guild_id
JOIN guild_role_settings AS r ON r.guild_id = m.guild_id
WHERE m.user_id = 841827102331240468
Note that 3 tables are joined on guild_id; but only 2 = are needed.
SUM without GROUP BY sums up the entire resultset (after JOINing). But you have 6 non-aggregates, so you need to GROUP BY all 6.
But that may lead to grossly inflated sums. Maybe you need to do the aggregation just over f first since that is where you are summing. Then JOIN to the rest??

MySQL Query limiting results by sub table

I'm really struggling with this query and I hope somebody can help.
I am querying across multiple tables to get the dataset that I require. The following query is an anonymised version:
SELECT main_table.id,
sub_table_1.field_1,
main_table.field_1,
main_table.field_2,
main_table.field_3,
main_table.field_4,
main_table.field_5,
main_table.field_6,
main_table.field_7,
sub_table_2.field_1,
sub_table_2.field_2,
sub_table_2.field_3,
sub_table_3.field_1,
sub_table_4.field_1,
sub_table_4.field_2
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
WHERE sub_table_4.field_1 = '' AND sub_table_4.field_2 = '0' AND sub_table_2.field_1 != ''
The query works, the problem I have is sub_table_1 has a revision number (int 11). Currently I get duplicate records with different revision numbers and different versions of sub_table_1.field_1 which is to be expected, but I want to limit the result set to only include results limited by the latest revision number, giving me only the latest sub_table_1_field_1 and I really can not figure it out!
Can anybody lend me a hand?
Many Thanks.
It's always important to remember that a JOIN can be on a subquery as well as a table. You could build a subquery that returns the results you want to see then, once you've got the data you want, join it in the parent query.
It's hard to 'tailor' an answer that's specific to you problem, as it's too obfuscated (as you admit) to know what the data and tables really look like, but as an example:
Say table1 has four fields: id, revision_no, name and stuff. You want to return a distinct list of name values, with their latest version of stuff (which, we'll pretend varies by revision). You could do this in isolation as:
select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no;
(Note: see fiddle at the end)
That would return each individual name with the latest revision of stuff.
Once you've got that nailed down, you could then swap out
INNER JOIN sub_table_1 ON sub_table_1.id = main_table.id
....with....
INNER JOIN (select t.* from table1 t
inner join
(SELECT name, max(revision_no) maxr
FROM table1
GROUP BY name) mx
on mx.name = t.name
and mx.maxr = t.revision_no) sub_table_1
ON sub_table_1.id = main_table.id
...which would allow a join with a recordset that is more tailored to that which you want to join (again, don't get hung up on the actual query I've used, it's just there to demonstrate the method).
There may well be more elegant ways to achieve this, but it's sometimes good to start with a simple solution that's easier to replicate, then simplify it once you've got the general understanding of the what and why nailed down.
Hope that helps - as I say, it's as specific as I could offer without having an idea of the real data you're using.
(for the sake of reference, here is a fiddle with a working version of the above example query)
In your case where you only need one column from the table, make this a subquery in your select clause instead of than a join. You get the latest revision by ordering by revision number descending and limiting the result to one row.
SELECT
main_table.id,
(
select sub_table_1.field_1
from sub_table_1
where sub_table_1.id = main_table.id
order by revision_number desc
limit 1
) as sub_table_1_field_1,
main_table.field_1,
...
FROM main_table
INNER JOIN sub_table_4 ON sub_table_4.id = main_table.id
INNER JOIN sub_table_2 ON sub_table_2.id = main_table.id
INNER JOIN sub_table_3 ON sub_table_3.id = main_table.id
WHERE sub_table_4.field_1 = ''
AND sub_table_4.field_2 = '0'
AND sub_table_2.field_1 != '';

MySQL Left Join (I think) Difficulty

I have the following query:
SELECT
u.username as username,
s.campaignno as campaign,
if(f.hometeamscore>f.awayteamscore,1,0) as Win,
if(f.hometeamscore=f.awayteamscore,1,0) as Draw,
if(f.hometeamscore<f.awayteamscore,1,0) as Loss,
f.hometeamscore as Goals,
ss.seasonid as Season,
av.avatar as Avatar
FROM
avatar_avatar av,
straightred_fixture f,
straightred_userselection s,
auth_user u,
straightred_season ss
WHERE
av.user_id = u.id
AND ss.seasonid = 1025
AND f.soccerseasonid = ss.seasonid
AND s.fixtureid = f.fixtureid
AND s.teamselectionid = f.hometeamid
AND s.user_id = u.id;
This query is working as expected but I have now realised that a user may not have uploaded a profile picture. So the following part av.user_id = u.id is excluding anyone who has NOT uploaded a profile picture. I feel i need to use a left join after reading the following https://www.w3schools.com/sql/sql_join.asp but I just keep going around in circles and get nowhere.
Any guidance on this would be greatly appreciated, many thanks, Alan.
First and foremost: avoid implicit JOINs. Make JOINs explicit and you will make much more clear which entity relates to which entity, and you'll never forget to add one of the AND conditions in your WHERE and get a cartesian product.
Second: try to put your tables in the FROM using an order that follows a certain logic. In your case, you seem to start looking for ss.seasonid = 1025... (it's the only condition on the WHERE having a constant). Then, your list of conditions produces a certain logical order... Each table in the FROM has a relationship with the previous one...
That said, I think you need this query:
SELECT
u.username as username,
s.campaignno as campaign,
if(f.hometeamscore>f.awayteamscore,1,0) as Win,
if(f.hometeamscore=f.awayteamscore,1,0) as Draw,
if(f.hometeamscore<f.awayteamscore,1,0) as Loss,
f.hometeamscore as Goals,
ss.seasonid as Season,
av.avatar as Avatar
FROM
straightred_season ss
JOIN straightred_fixture f
ON f.soccerseasonid = ss.seasonid
JOIN straightred_userselection s
ON s.fixtureid = f.fixtureid AND s.teamselectionid = f.hometeamid
JOIN auth_user u
ON u.id = s.user_id
-- This last table is the one that needs to be LEFT-joined
-- if the avatar is *optional*. If it isn't there, av.avatar will just
-- be shown as NULL
LEFT JOIN avatar_avatar av
ON av.user_id = u.id
WHERE
ss.seasonid = 1025 ;
If the content of more tables is optional, you may need more than one LEFT JOIN. In order to find out what makes sense, we would need to have the full data model, or the ERD, that represents your scenario. That is, which relationships are 1 to 1, which are 1 to Many, which are 1 to (0 or 1), which are Many-to-Many, etc.
I'm a fan of using JOIN's so I rewrote your query like this.
Be advised however that I user SQL SERVER / ORACLE and not MYSQL so not sure if my semantics are correct. I use the IFNULL function since at least in my world, using a column where the row isn't available can cause the entire result to filter out.
Also by moving ss.seasonid = 1025 into the join, rather than leaving it in the where, you should get results regardless of there existing an ss record.
That said, this should resolve your issues:
EDIT - replace ISNULL with IFNULL
select
u.username as username
,s.campaignno as campaign
,if(ifnull(f.hometeamscore,0)>ifnull(f.awayteamscore,0),1,0) as Win
,if(ifnull(f.hometeamscore,0)=ifnull(f.awayteamscore,-1),1,0) as Draw
,if(ifnull(f.hometeamscore,0)<ifnull(f.awayteamscore,0),1,0) as Loss
,f.hometeamscore as Goals
,ss.seasonid as Season
,av.avatar as Avatar
from
auth_user u
Left Join
avatar_avatar av on u.id = av.user_id
Left Join
straightred_userselection s on u.id = s.user_id
Left Join
straightred_fixture f on f.hometeamid = s.teamselectionid
and f.fixtureid = s.fixtureid
Left Join
straightred_season ss on f.soccerseasonid = ss.seasonid
and ss.seasonid = 1025

Replacing Subqueries with Joins in MySQL

I have the following query:
SELECT PKID, QuestionText, Type
FROM Questions
WHERE PKID IN (
SELECT FirstQuestion
FROM Batch
WHERE BatchNumber IN (
SELECT BatchNumber
FROM User
WHERE RandomString = '$key'
)
)
I've heard that sub-queries are inefficient and that joins are preferred. I can't find anything explaining how to convert a 3+ tier sub-query to join notation, however, and can't get my head around it.
Can anyone explain how to do it?
SELECT DISTINCT a.*
FROM Questions a
INNER JOIN Batch b
ON a.PKID = b.FirstQuestion
INNER JOIN User c
ON b.BatchNumber = c.BatchNumber
WHERE c.RandomString = '$key'
The reason why DISTINCT was specified is because there might be rows that matches to multiple rows on the other tables causing duplicate record on the result. But since you are only interested on records on table Questions, a DISTINCT keyword will suffice.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Try :
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q
INNER JOIN Batch b ON q.PKID = b.FirstQuestion
INNER JOIN User u ON u.BatchNumber = q.BatchNumber
WHERE u.RandomString = '$key'
select
q.pkid,
q.questiontext,
q.type
from user u
join batch b
on u.batchnumber = b.batchnumber
join questions q
on b.firstquestion = q.pkid
where u.randomstring = '$key'
Since your WHERE clause filters on the USER table, start with that in the FROM clause. Next, apply your joins backwards.
In order to do this correctly, you need distinct in the subquery. Otherwise, you might multiply rows in the join version:
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q join
(select distinct FirstQuestion
from Batch b join user u
on b.batchnumber = u.batchnumber and
u.RandomString = '$key'
) fq
on q.pkid = fq.FirstQuestion
As to whether the in or join version is better . . . that depends. In some cases, particularly if the fields are indexed, the in version might be fine.