MySQL involving crazy multiple self-joins - mysql

As part of the process of replacing some old code that used an incredibly slow nested select, I've ended up with a query that looks like this:
SELECT r3.r_id AS r3_id, r2.r_id AS r2_id, r1.r_id AS r1_id
FROM
table_r r3
LEFT JOIN (
table_r r2
INNER JOIN (
table_r r1
INNER JOIN table_d d ON r1.r_id = d.r_id
) ON r2.r_id = r1.parent_id
) ON r3.r_id = r2.r_id
WHERE d.d_id = 3
So in the innermost join, I'm looking for the records in table_r (copy r1) which have a relationship with a subset of records from table_d.
In the next join out, I'm looking for records in a second copy of table_r (r2) whose main index (r_id) matches the parent index (parent_id) of the records from the previous join.
Then I'm trying to do a LEFT JOIN with a third copy of table_r (r3), simply matching r_id with the r_id of the previous join. The idea of this outermost join is to get ALL of the records from table_r, but to then do the equivalent of a NOT IN select by using a further condition (not yet in my query) to determine which records in r3 have NULLs for r2_id.
The problem is that the LEFT JOIN is not giving me the whole of table_r. It's giving me the same subset of records that I get without the final join - in other words, the same thing as an INNER JOIN. So whereas I'm expecting 1208 records, I get 508.
I know I must be doing something screwy here...

What happens, if you try this?
SELECT r3.r_id AS r3_id, r2.r_id AS r2_id, r1.r_id AS r1_id
FROM
table_r r3
LEFT JOIN (
table_r r2
INNER JOIN (
table_r r1
INNER JOIN table_d d ON r1.r_id = d.r_id AND d.d_id = 3
) ON r2.r_id = r1.parent_id
) ON r3.r_id = r2.r_id
What I did was moved the d.d_id = 3 from where clause to the INNER JOINs ON qualifiers.

Related

SQL query optimization for speed

So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.

PHP MySQL: Join 3 tables but

I have 3 tables, errorcode_table, description_table, and customer_table.
The query below will display all records that are in the errorcode_table and I have an inner join that will also display the customer_table as per the serial number in both tables.
SELECT
errorcode_table.error,
errorcode_table.deviceserialnumber,
customer_table.serialnumber,
customer_table.customer,
FROM errorcode_table
INNER JOIN customer_table
ON errorcode_alert_table.deviceserialnumber = customerinfo_table.serialnumber
Now I want to also display the description of the error code as well, here's my attempt:
SELECT
errorcode_table.error,
errorcode_table.serialnumber,
customer_table.serialnumber,
customer_table.customer,
description.serialnumber
description.info
FROM errorcode_table
INNER JOIN customer_table
RIGHT JOIN description_table
ON errorcode_table.deviceserialnumber = customer_table.serialnumber
ON errorcode_table.deviceserialnumber = description_table.serialnumber
Now I'm not getting any records. Please assist.
The ON clause for each join should appear immediately after each join condition. And you can introduce table aliases to make the query easier to read.
SELECT
e.error,
e.serialnumber,
c.serialnumber,
c.customer,
d.serialnumber,
d.info
FROM errorcode_table e
INNER JOIN customer_table c
ON e.deviceserialnumber = c.serialnumber
RIGHT JOIN description_table d
ON e.deviceserialnumber = d.serialnumber;

SQL Join with Value Missing in One Column

I'm trying to create a SQL query that uses one table to count the number of blade servers our company has in each chassis and groups those, while joining it with chassis information from another table.
However, one of the chassis has no blades in it, so the name does not appear in the blade inventory table. Using an INNER JOIN creates a table that doesn't contain that blade in any capacity. A LEFT JOIN achieves the same effect, but a RIGHT JOIN gives me an extra row with a null value for the chassis name.
I'm guessing this is because the non-existence of that blade name in the first table is being given precedence over the second, but not sure how to correct that. My query, as of now, looks like this:
SELECT e.EnclosureName, e.PDUName, q.Blades, r.Serial#
FROM bladeinventory.table e JOIN
(
SELECT EnclosureName,COUNT(*) Blades
FROM bladeinventory.table
GROUP BY EnclosureName
) q ON e.EnclosureName = q.EnclosureName
LEFT JOIN chassisinventory.table r
ON e.EnclosureName = r.EnclosureName
GROUP BY e.EnclosureName, e.PDUName, q.Blades, r.Serial#
Is it possible to edit this in such a way that the name of the chassis with 0 blades is actually generated by the query?
Just pull the name from the chassisinventory table. I'll use coalesce(), just in case you switch the order of the joins (again):
SELECT COALESCE(r.EncloseName, e.EnclosureName) as EnclosureName, e.PDUName, q.Blades, r.Serial#
FROM bladeinventory.table e JOIN
(SELECT EnclosureName,COUNT(*) Blades
FROM bladeinventory.table
GROUP BY EnclosureName
) q
ON e.EnclosureName = q.EnclosureName LEFT JOIN
chassisinventory.table r
ON e.EnclosureName = r.EnclosureName
GROUP BY COALESCE(r.EncloseName, e.EnclosureName), e.PDUName, q.Blades, r.Serial#;
You can also use below code where case is being used which is much simpler and effective
SELECT e.EnclosureName, r.PDUName,
case when q.Blades IS NULL then 0
else q.Blades end Blades,
e.Serial#
FROM chassisinventory.table e
LEFT OUTER JOIN bladeinventory.table r on e.EnclosureName = r.EnclosureName
LEFT OUTER JOIN (SELECT EnclosureName,COUNT(*) Blades
FROM bladeinventory.table
GROUP BY EnclosureName
) q on e.EnclosureName = q.EnclosureName

Sql query taking long time with inner join

I am supposed to write a query which requires joining 3 tables.
The query designed by me works fine, but it takes a lot of time to execute.
SELECT v.LinkID, r.SourcePort, r.DestPort, r.NoOfBytes, r.StartTime , r.EndTime, r.Direction, r.nFlows
FROM LINK_TBL v
INNER JOIN NODEIF_TBL n
INNER JOIN RAW_TBL r ON
r.RouterIP=n.ifipaddress
and n.NodeNumber=v.orinodenumber
and v.oriIfIndex=r.OriIfIndex;
Is there any issue w.r.t performance in this query ?
Try this one put the on conditions in the joins
SELECT v.LinkID, r.SourcePort, r.DestPort, r.NoOfBytes, r.StartTime , r.EndTime, r.Direction, r.nFlows
FROM LINK_TBL v
INNER JOIN NODEIF_TBL n ON (n.NodeNumber=v.orinodenumber )
INNER JOIN RAW_TBL r ON (r.RouterIP=n.ifipaddress and v.oriIfIndex=r.OriIfIndex)
Try this:
SELECT v.LinkID, r.SourcePort, r.DestPort, r.NoOfBytes, r.StartTime , r.EndTime, r.Direction, r.nFlows
FROM LINK_TBL v
INNER JOIN NODEIF_TBL n ON
n.NodeNumber=v.orinodenumber
INNER JOIN RAW_TBL r ON
r.RouterIP=n.ifipaddress
and v.oriIfIndex=r.OriIfIndex;
The join order is somewhat weird. I don't work with mysql so maybe it is just some unique way to join, but usually you join like:
FROM
a
INNER JOIN b ON a.id1 = b.id2
INNER JOIN c ON b.id3 = c.id4
Since you are using INNER JOIN this way you first filter out a with b joins and only then use the remaining join to filter out thus saving a lot of comparison actions. Imagine each table has 1 thousand rows. When you add c this becomes 1 million comparisons. Meanwhile with my example it would only be 1000 + 1000 comparisons instead of 1000 * 1000.

Is it possible to convert this subquery into a join?

I want to replace the subquery with a join, if possible.
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON fftenant_surveyanswer.surveyquestion_id = 1
AND fftenant_surveyanswer.`surveyresult_id` IN (SELECT y.`surveyresult_id` FROM `fftenant_farmer_surveyresults` y WHERE y.farmer_id = `fftenant_farmer`.`person_ptr_id`)
I tried:
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
But that gave me one record per farmer per survey result for that farmer. I only want one record per farmer as returned by the first query.
A join may be faster on most RDBMs, but the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible.
You could use DISTINCT or GROUP BY, as mvds and Brilliand suggest, but I think it's closer to the query's design intent if you change the last join to an inner-join, but elevating its precedence:
SELECT farmer.person_ptr_id, surveyanswer.text_value
FROM fftenant_farmer AS farmer
INNER
JOIN fftenant_person AS person
ON person.id = farmer.person_ptr_id
LEFT
OUTER
JOIN
( fftenant_farmer_surveyresults AS farmer_surveyresults
INNER
JOIN fftenant_surveyanswer AS surveyanswer
ON surveyanswer.surveyresult_id = farmer_surveyresults.surveyresult_id
AND surveyanswer.surveyquestion_id = 1
)
ON farmer_surveyresults.farmer_id = farmer.person_ptr_id
Broadly speaking, this will end up giving the same results as the DISTINCT or GROUP BY approach, but in a more principled, less ad hoc way, IMHO.
Use SELECT DISTINCT or GROUP BY to remove the duplicate entries.
Changing your attempt as little as possible:
SELECT DISTINCT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible
Then consider a much simpler example to begin with e.g.
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T2);
This is known as a semi join and if desired may be re-written using (among other possibilities) a JOIN with a SELECT clause to a) project only from the 'outer' table, and b) return only DISTINCT rows:
SELECT DISTINCT T1.*
FROM T1
JOIN T2 USING (id);