Cross Join much faster than Inner Join - mysql

At work I came across this sort of query:
select distinct psv.pack_id from pack_store_variant psv, pack p
where p.id = psv.pack_id and p.store_id = 1 and psv.store_variant_id = 196;
Being new to the select from table1, table2 I did a bit of searching and found out this does basically a cartesian product of the two tables. I thought that this is unnecessarily creating NxM rows, we can just use a regular join and it should work. So I wrote this query:
SELECT DISTINCT pack_id from (SELECT pack_id, store_id, store_variant_id
FROM pack JOIN pack_store_variant ON pack.id = pack_store_variant.pack_id) as comb
where comb.store_id = 1 AND comb.store_variant_id = 196;
Surprisingly when I did the comparison, the first one was an order of magnitude faster than mine. Does my query suck somehow? Or am I not understanding the difference between cross join/inner join properly?

Your query is not so good. You split your query into two selects. The inner one creates a table, on which you then select again. That's not very efficient. This is how I would do it.
select distinct psv.pack_id from pack_store_variant as psv
Join pack as p on p.id = psv.pack_id
where p.store_id = 1 and psv.store_variant_id = 196;

Related

SQL query optimization for speed

So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.

Summing multiple columns - unexpected results

Why
Does this give incorrect results?
SELECT
people.name,
SUM(allorders.TOTAL),
SUM(allorders.DISCOUNT),
SUM(allorders.SERVICECHARGE),
SUM(payments.AMOUNT)
FROM
people
INNER JOIN
allorders ON allorders.CUSTOMER = people.ID
INNER JOIN
payments ON payments.CUSTOMER = people.ID
WHERE
people.ID = 7 AND allorders.VOIDED = 0 AND payments.VOIDED = 0
Gives: (the name), 1644000, 1100000, 50000, 1485000
If I do it two tables at a time (INNER JOIN people ON allorders.CUSTOMER = people.ID) in separate queries, I get the correct results. I don't don't even know where the numbers I get come from. Like:
SELECT
people.name,
SUM(allorders.TOTAL),
SUM(allorders.DISCOUNT),
SUM(allorders.SERVICECHARGE)
FROM
people
INNER JOIN
allorders ON allorders.CUSTOMER = people.ID
WHERE people.ID = 7 AND allorders.VOIDED = 0
Gives: (the name), 822000, 550000, 25000
SELECT
people.name,
SUM(payments.AMOUNT)
FROM
people
INNER JOIN payments ON payments.CUSTOMER = people.ID
WHERE people.ID = 7 AND payments.VOIDED = 0
Gives: (the name), 297000
It looks like it doubles, but I don't know why.
The odd thing is I have a similar query that does this sum correctly. I'll post it, but it's a bit complex. Here goes:
SELECT
t1.IDENTIFIER,
ifnull(t1.NAME,""),
t1.PRICE,
t1.GUESTS,
t1.STATUS,
ifnull(t1.NOTE,""),
t1.LINK,
ifnull(t1.EDITOR,""),
concat(t2.FIRSTNAME,"",t2.LASTNAME),
t2.ID,
t3.ID,
ifnull(t1.EMAIL,""),
ifnull(t3.PHONE,""),
ifnull(SUM(p1.AMOUNT),0),
ifnull(SUM(o1.DISCOUNT),0),
ifnull(SUM(o1.TOTAL),0),
ifnull(SUM(o1.SERVICECHARGE),0)
FROM
tables t1
INNER JOIN
people t2 ON t1.SELLER = t2.ID
INNER JOIN
people t3 ON t1.CUSTOMER = t3.ID
INNER JOIN
orderpaymentinfo ON orderpaymentinfo.TABLEID = t1.IDENTIFIER
INNER JOIN
payments p1 ON orderpaymentinfo.PAYMENTID = p1.PAYMENTID
INNER JOIN
allorders o1 ON o1.ORDERID = orderpaymentinfo.ORDERID
WHERE
p1.VOIDED = 0 AND o1.VOIDED = 0 AND t1.DATE = "2014-12-20"
GROUP BY t1.IDENTIFIER
The latter statement does the same join, only it uses an additional helper-table. I'm sorry it's a bit poorly formatted (I'm not great with SO's formatter), but if someone can tell me the difference between the logic in these two statements and how one can be completely wrong while the other right, I'd be very happy.
In response to answer:
Result 1:
Name - 5
Result 2:
Name - 2
Result 3:
Name - 10
Result 4 is truncated in phpMyAdmin - where would I get this easily?
Table structure for the three tables looks like:
SHOW create on the way.
Okay, so I am pretty sure you've a join condition that's basically exploding your result set into something like a Cartesian product. Here's what I think you should try
First, run the following and share the output:
SELECT p.name,COUNT(*)
FROM people as p
INNER JOIN allorders AS a
ON a.CUSTOMER = p.ID
WHERE p.ID = 7 AND a.VOIDED = 0
GROUP BY p.name
Then run
SELECT p.name,COUNT(*)
FROM people AS p
INNER JOIN payments AS pay
ON pay.CUSTOMER = p.ID
WHERE p.ID = 7 AND pay.VOIDED = 0
GROUP BY p.name
Then run
SELECT
p.name,
COUNT(*)
FROM
people as p
INNER JOIN
allorders as a ON a.CUSTOMER = p.ID
INNER JOIN
payments as pay ON pay.CUSTOMER = p.ID
WHERE
p.ID = 7 AND a.VOIDED = 0 AND pay.VOIDED = 0
GROUP BY p.name
Last run the following
SHOW CREATE TABLE people;
SHOW CREATE TABLE payments;
SHOW CREATE TABLE allorders;
The problem is that you don't have the correct understanding of your data. You need to give us a bit more info about the data and the relationships, and the output I've described here should help. Mine is not an answer. But if you run these queries and paste the output of them, you should be able to get an answer, either from me or someone else.
Based on the discussion and edits above, please try:
SELECT
p.name,
SUM(o.TOTAL),
SUM(o.DISCOUNT),
SUM(o.SERVICECHARGE),
MAX(pay.amt)
FROM
people as p
INNER JOIN
allorders AS o ON o.CUSTOMER = p.ID
INNER JOIN (SELECT customer,
SUM(amount) as amt
FROM payments
WHERE voided = 0 AND customer = 7
GROUP BY customer) AS pay
ON pay.customer = p.id
WHERE
p.ID = 7 AND o.VOIDED = 0
GROUP BY p.name
You could also do a subquery in your SELECT statement, but it's pretty obnoxious imo. You could also do min(pay.amt) or avg or even just leave the aggregate out altogether. The above should work... even though there are cleaner ways. I'm providing this answer so you can reason about why you were getting the unexpected result... actually optimizing your query is a different question that you can dive into later, once you've had a chance to look over this

Replacing Subqueries with Joins in MySQL

I have the following query:
SELECT PKID, QuestionText, Type
FROM Questions
WHERE PKID IN (
SELECT FirstQuestion
FROM Batch
WHERE BatchNumber IN (
SELECT BatchNumber
FROM User
WHERE RandomString = '$key'
)
)
I've heard that sub-queries are inefficient and that joins are preferred. I can't find anything explaining how to convert a 3+ tier sub-query to join notation, however, and can't get my head around it.
Can anyone explain how to do it?
SELECT DISTINCT a.*
FROM Questions a
INNER JOIN Batch b
ON a.PKID = b.FirstQuestion
INNER JOIN User c
ON b.BatchNumber = c.BatchNumber
WHERE c.RandomString = '$key'
The reason why DISTINCT was specified is because there might be rows that matches to multiple rows on the other tables causing duplicate record on the result. But since you are only interested on records on table Questions, a DISTINCT keyword will suffice.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Try :
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q
INNER JOIN Batch b ON q.PKID = b.FirstQuestion
INNER JOIN User u ON u.BatchNumber = q.BatchNumber
WHERE u.RandomString = '$key'
select
q.pkid,
q.questiontext,
q.type
from user u
join batch b
on u.batchnumber = b.batchnumber
join questions q
on b.firstquestion = q.pkid
where u.randomstring = '$key'
Since your WHERE clause filters on the USER table, start with that in the FROM clause. Next, apply your joins backwards.
In order to do this correctly, you need distinct in the subquery. Otherwise, you might multiply rows in the join version:
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q join
(select distinct FirstQuestion
from Batch b join user u
on b.batchnumber = u.batchnumber and
u.RandomString = '$key'
) fq
on q.pkid = fq.FirstQuestion
As to whether the in or join version is better . . . that depends. In some cases, particularly if the fields are indexed, the in version might be fine.

Is it possible to convert this subquery into a join?

I want to replace the subquery with a join, if possible.
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON fftenant_surveyanswer.surveyquestion_id = 1
AND fftenant_surveyanswer.`surveyresult_id` IN (SELECT y.`surveyresult_id` FROM `fftenant_farmer_surveyresults` y WHERE y.farmer_id = `fftenant_farmer`.`person_ptr_id`)
I tried:
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
But that gave me one record per farmer per survey result for that farmer. I only want one record per farmer as returned by the first query.
A join may be faster on most RDBMs, but the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible.
You could use DISTINCT or GROUP BY, as mvds and Brilliand suggest, but I think it's closer to the query's design intent if you change the last join to an inner-join, but elevating its precedence:
SELECT farmer.person_ptr_id, surveyanswer.text_value
FROM fftenant_farmer AS farmer
INNER
JOIN fftenant_person AS person
ON person.id = farmer.person_ptr_id
LEFT
OUTER
JOIN
( fftenant_farmer_surveyresults AS farmer_surveyresults
INNER
JOIN fftenant_surveyanswer AS surveyanswer
ON surveyanswer.surveyresult_id = farmer_surveyresults.surveyresult_id
AND surveyanswer.surveyquestion_id = 1
)
ON farmer_surveyresults.farmer_id = farmer.person_ptr_id
Broadly speaking, this will end up giving the same results as the DISTINCT or GROUP BY approach, but in a more principled, less ad hoc way, IMHO.
Use SELECT DISTINCT or GROUP BY to remove the duplicate entries.
Changing your attempt as little as possible:
SELECT DISTINCT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible
Then consider a much simpler example to begin with e.g.
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T2);
This is known as a semi join and if desired may be re-written using (among other possibilities) a JOIN with a SELECT clause to a) project only from the 'outer' table, and b) return only DISTINCT rows:
SELECT DISTINCT T1.*
FROM T1
JOIN T2 USING (id);

MySql Query takes forever

hi I am doing A query to get some product info, but there is something strange going on, the first query returns resultset fast (.1272s) but the second (note that I just added 1 column) takes forever to complete (28-35s), anyone know what is happening?
query 1
SELECT
p.partnumberp,
p.model,
p.descriptionsmall,
p.brandname,
sum(remainderint) stockint
from
inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid= ind.inventoryinid)
group by partnumberp, projectid
query 2
SELECT
p.partnumberp,
p.model,
p.descriptionsmall,
p.brandname,
p.descriptiondetail,
sum(remainderint) stockint
from
inventario_dbo.inventoryindetails inda
left join purchaseorders.product p on (p.partnumberp = inda.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid= inda.inventoryinid)
group by partnumberp, projectid
You shouldn't group by some columns and then select other columns unless you use aggregate functions. Only p.partnumberp and sum(remainderint) make sense here. You're doing a huge join and select and then the results for most rows just end up getting discarded.
You can make the query much faster by doing an inner select first and then joining that to the remaining tables to get your final result for the last few columns.
The inner select should look something like this:
select p.partnumberp, projectid, sum(remainderint) stockint
from inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid = ind.inventoryinid)
group by partnumberp, projectid
After the join:
select T1.partnumberp, T1.projectid, p2.model, p2.descriptionsmall, p2.brandname, T1.stockint
from
(select p.partnumberp, projectid, sum(remainderint) stockint
from inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid = ind.inventoryinid)
group by partnumberp, projectid) T1
left join purchaseorders.product p2 on (p2.partnumberp = T1.partnumberp)
Is descriptiondetail a really large column? Sounds like it could be a lot of text compared to the other fields based on its name, so maybe it just takes a lot more time to read from disk, but if you could post the schema detail for the purchaseorders.product table or maybe the average length of that column that would help.
Otherswise I would try running the query a few times and see you consistently get the same time results. Could just be load on the database server the time you got the slower result.