I want to replace the subquery with a join, if possible.
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON fftenant_surveyanswer.surveyquestion_id = 1
AND fftenant_surveyanswer.`surveyresult_id` IN (SELECT y.`surveyresult_id` FROM `fftenant_farmer_surveyresults` y WHERE y.farmer_id = `fftenant_farmer`.`person_ptr_id`)
I tried:
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
But that gave me one record per farmer per survey result for that farmer. I only want one record per farmer as returned by the first query.
A join may be faster on most RDBMs, but the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible.
You could use DISTINCT or GROUP BY, as mvds and Brilliand suggest, but I think it's closer to the query's design intent if you change the last join to an inner-join, but elevating its precedence:
SELECT farmer.person_ptr_id, surveyanswer.text_value
FROM fftenant_farmer AS farmer
INNER
JOIN fftenant_person AS person
ON person.id = farmer.person_ptr_id
LEFT
OUTER
JOIN
( fftenant_farmer_surveyresults AS farmer_surveyresults
INNER
JOIN fftenant_surveyanswer AS surveyanswer
ON surveyanswer.surveyresult_id = farmer_surveyresults.surveyresult_id
AND surveyanswer.surveyquestion_id = 1
)
ON farmer_surveyresults.farmer_id = farmer.person_ptr_id
Broadly speaking, this will end up giving the same results as the DISTINCT or GROUP BY approach, but in a more principled, less ad hoc way, IMHO.
Use SELECT DISTINCT or GROUP BY to remove the duplicate entries.
Changing your attempt as little as possible:
SELECT DISTINCT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible
Then consider a much simpler example to begin with e.g.
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T2);
This is known as a semi join and if desired may be re-written using (among other possibilities) a JOIN with a SELECT clause to a) project only from the 'outer' table, and b) return only DISTINCT rows:
SELECT DISTINCT T1.*
FROM T1
JOIN T2 USING (id);
Related
So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.
I'm trying to study SQL.
I have a problem with JOIN
I want to display ref_id, pro_name, class_name but I couldn't.
I find EFFICIENT solution.
MY QUERY (DOESN'T WORK)
SELECT
ref_id, pro_name, class_name
FROM
RC, RP, PP, LP
WHERE
RC.ref_id = RP.ref_id
Avoid using commas be CROSS JOIN
You could use JOIN to instead of commas
like this.
SELECT
RP.ref_id, PP.pro_name, LP.class_name
FROM
RP
LEFT JOIN RC ON RC.ref_id = RP.ref_id
LEFT JOIN PP ON PP.pro_id = RP.pro_id
LEFT JOIN LP ON LP.lec_id = RP.lec_id
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
You would seem to want:
select rp.pro_id, pp.pro_name, lp.class_name
from rp left join
pp
on rp.pro_id = pp.pro_id left join
lp
on rp.lec_id = lp.lec_id;
Note the use of left join. This ensure that all rows are in the result set, even when one or the other joins doesn't find a matching record.
From what I can see, the table rc is not needed to answer this specific question.
I'm trying to left join the second table useri_ban based on the users' ids, with the extra condition: useri_ban.start_ban = max_start.
In order for me to calculate max_start, I have to run the following subquery:
(SELECT MAX(ub.start_ban) AS max_start, user_id FROM useri_ban ub WHERE ub.user_id = useri.id)
Furthermore, in order to add max_start to every row, I need to inner join this subquery's result into the main result. However, it seems that once I apply that join, the subquery is no longer able to access useri.id.
What am I doing wrong?
SELECT
useri.id as id,
useri.email as email,
useri_ban.warning_type_id as warning_type_id,
useri_ban.type as type,
useri.created_at AS created_at
FROM `useri`
inner join
(SELECT MAX(ub.start_ban) AS max_start, user_id FROM useri_ban ub WHERE ub.user_id = useri.id) `temp`
on `useri`.`id` = `temp`.`user_id`
left join `useri_ban` on `useri_ban`.`user_id` = `useri`.`id` and `useri_ban`.`start_ban` = `max_start`
Does this solve your problem? You need GROUP BY in the inner query instead of another join.
SELECT useri.id, useri.email, maxQuery.maxStartBan
FROM useri
INNER JOIN
(
SELECT useri_ban.user_id ubid, MAX(useri_ban.startban) maxStartBan
FROM useri_ban
GROUP BY useri_ban.user_id
) AS maxQuery
ON maxQuery.ubid = useri.id;
At work I came across this sort of query:
select distinct psv.pack_id from pack_store_variant psv, pack p
where p.id = psv.pack_id and p.store_id = 1 and psv.store_variant_id = 196;
Being new to the select from table1, table2 I did a bit of searching and found out this does basically a cartesian product of the two tables. I thought that this is unnecessarily creating NxM rows, we can just use a regular join and it should work. So I wrote this query:
SELECT DISTINCT pack_id from (SELECT pack_id, store_id, store_variant_id
FROM pack JOIN pack_store_variant ON pack.id = pack_store_variant.pack_id) as comb
where comb.store_id = 1 AND comb.store_variant_id = 196;
Surprisingly when I did the comparison, the first one was an order of magnitude faster than mine. Does my query suck somehow? Or am I not understanding the difference between cross join/inner join properly?
Your query is not so good. You split your query into two selects. The inner one creates a table, on which you then select again. That's not very efficient. This is how I would do it.
select distinct psv.pack_id from pack_store_variant as psv
Join pack as p on p.id = psv.pack_id
where p.store_id = 1 and psv.store_variant_id = 196;
I have the following query:
SELECT PKID, QuestionText, Type
FROM Questions
WHERE PKID IN (
SELECT FirstQuestion
FROM Batch
WHERE BatchNumber IN (
SELECT BatchNumber
FROM User
WHERE RandomString = '$key'
)
)
I've heard that sub-queries are inefficient and that joins are preferred. I can't find anything explaining how to convert a 3+ tier sub-query to join notation, however, and can't get my head around it.
Can anyone explain how to do it?
SELECT DISTINCT a.*
FROM Questions a
INNER JOIN Batch b
ON a.PKID = b.FirstQuestion
INNER JOIN User c
ON b.BatchNumber = c.BatchNumber
WHERE c.RandomString = '$key'
The reason why DISTINCT was specified is because there might be rows that matches to multiple rows on the other tables causing duplicate record on the result. But since you are only interested on records on table Questions, a DISTINCT keyword will suffice.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Try :
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q
INNER JOIN Batch b ON q.PKID = b.FirstQuestion
INNER JOIN User u ON u.BatchNumber = q.BatchNumber
WHERE u.RandomString = '$key'
select
q.pkid,
q.questiontext,
q.type
from user u
join batch b
on u.batchnumber = b.batchnumber
join questions q
on b.firstquestion = q.pkid
where u.randomstring = '$key'
Since your WHERE clause filters on the USER table, start with that in the FROM clause. Next, apply your joins backwards.
In order to do this correctly, you need distinct in the subquery. Otherwise, you might multiply rows in the join version:
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q join
(select distinct FirstQuestion
from Batch b join user u
on b.batchnumber = u.batchnumber and
u.RandomString = '$key'
) fq
on q.pkid = fq.FirstQuestion
As to whether the in or join version is better . . . that depends. In some cases, particularly if the fields are indexed, the in version might be fine.