Hi all champions out there
I am far from a guru when it comes to high performance SQL queries and and wonder in anyone can help me improve the query below. It works but takes far too long, especially since I can have 100 or so entries in the IN () part.
The code is as follows, hope you can figure out the schema enough to help.
SELECT inv.amount
FROM invoice inv
WHERE inv.invoiceID IN (
SELECT childInvoiceID
FROM invoiceRelation ir
LEFT JOIN Payment pay ON pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN ( 125886, 119293, 123497 ) )
Restructure your query to use a join instead of a subselect. Also, use an INNER JOIN instead of a LEFT JOIN to the Payment table. This is justified, since you have a WHERE filter that would filter rows without a match in the Payment table anyway.
SELECT inv.amount
FROM invoice inv
INNER JOIN invoiceRelation ir ON inv.incoiceID = ir.childInvoiceID
INNER JOIN Payment pay on pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN (...)
One way to improve performance is to have a good index on relevant columns. In your example, an index on inv.invoiceID would probably speed up the query quite a bit.
Also on pay.paymentID
Try this and see if it helps:
ALTER TABLE invoice ADD INDEX invoiceID_idx (invoiceID);
and
ALTER TABLE Payment ADD INDEX paymendID_idx (paymentID);
I think instead of first in you can use inner join.
select inv.amount from invoice inv
inner join invoiceRelation ir on (inv.invoiceID = ir.childInvoiceID)
LEFT JOIN Payment pay ON pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN (125886,119293,123497)
What if you try like this instead:
SELECT inv.amount
FROM invoice inv
inner join invoiceRelation ir on inv.invoiceID = ir.parentInvoiceID
left join Payment pay on pay.invoiceID = ir.parentInvoiceID
WHERE pay.paymentID IN ( 125886, 119293, 123497 )
(OR) better; if you sure you have a invoiceID FK in payment table then make that left join to a inner join.
In a nutshell, you should try avoiding correlated subquery at all times if you can replace that subquery to a join.
Related
in query here i have https://www.db-fiddle.com/f/32Kc3QisUEwmSM8EmULpgd/1
SELECT p.prank, d.dare
FROM dares d
INNER JOIN pranks p ON p.id = d.prank_id
WHERE d.condo_id = 1;
i have one condo with id 1 and it have unique connection to dares that has connection to pranks and unique connection to condos_pranks
and i wanna have all unique pranks from both tables and i used this query above to get relation of
dares to pranks and expected result was L,M,N - Yes,No,Maybe and it is correct but i also wanna have those in condos_pranks which ids are 1,4,5,6 = L,O,P,Q
so i tried to join the table with left join because it might not have condos_pranks row
SELECT p.prank, d.dare
FROM dares d
INNER JOIN pranks p ON p.id = d.prank_id
LEFT JOIN condos_pranks pd ON pd.condo_id = d.condo_id AND pd.prank_id = p.id
WHERE d.condo_id = 1;
but result is same as first and what i want is
prank
dare
L
Yes
M
No
N
Maybe
O
No
P
No
Q
No
with default being No = 2 if prank_id of condos_pranks is not in dares
how to connect it?
This seems like an exercise in identifying extraneous information more than anything. You are unable to join something to a table that has no key, however if you know your default then you may use something like coalesce to identify the records where there was no data to join NULL and replace them with your default.
I mentioned in a comment above that this table schema makes little sense. You have keys all over the place that doing have all sorts of circular references. If this is your derived schema, consider stopping here and revisiting the relationships. If it is not and it is something educational, which I suspect it is, disregard and recognize the logical flaws in what you are working in. Perhaps consider taking the data provided and creating a new table schema that is more normalized and uses other tables to handle the many to many and one to many relationships.
dbfiddle
SELECT
pranks.prank,
COALESCE(dares.dare, 'No')
FROM pranks LEFT OUTER JOIN
dares ON pranks.id = dares.prank_id
ORDER BY pranks.prank ASC;
clearlyclueless gave correct explanations
To achieve the result, the following SELECT can also be used:
SELECT
pranks.prank,
case
when dare is null then 'No'
else dare
end
FROM pranks LEFT OUTER JOIN
dares ON pranks.id = dares.prank_id
I have a join query which takes a lot of time to process.
SELECT
COUNT(c.id)
FROM `customers` AS `c`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
This is primarily because the table 'customers' has 2 million records.
I could go over into indexing etc. But, the larger point is, this 2.5 million could become a billion records 1 day.
I'm looking for solutions which can enhance performance.
I've given thought to
a) horizontal scalability. -: distribute the mysql table into different sections and query the count independently.
b) using composite indexes.
c) My favourite one -: Just create a seperate collection in mongodb or redis which only houses the count(output of this query) Since, the count is just 1 number. this will not require a huge size aka better query performance (Only question is, how many such queries are there, because that will increase size of the new collection)
Try this and see if it improve performance:
SELECT
COUNT(c.id)
FROM `customers` AS `c`
INNER JOIN `customer_extra` AS `cx` ON `c`.`id` = `cx`.`customer_id`
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
WHERE (c.shop_id = '12121') AND ((DATE(cx.last_email_open_date) > '2019-11-08'));
As I mention in the comment, since the condition AND ((DATE(cx.last_email_open_date) > '2019-11-08'));, already made customers table to INNER JOIN with customer_extra table, you might just change it to INNER JOIN customer_extra AS cx ON c.id = cx.customer_id and follow it with other LEFT JOIN.
The INNER JOIN will at least get the initial result to only return any customer who have last_email_open_date value based on what has been specified.
Say COUNT(*), not COUNT(c.id)
Remove these; they slow down the query without adding anything that I can see:
LEFT JOIN `setting` AS `ssh` ON `c`.`shop_id` = `ssh`.`id`
LEFT JOIN `customers_address` AS `ca` ON `ca`.`id` = `cx`.`customer_default_address_id`
LEFT JOIN `lytcustomer_tier` AS `ct` ON `cx`.`lyt_customer_tier_id` = `ct`.`id`
DATE(...) makes that test not "sargable". This works for DATE or DATETIME; and this is much faster:
cx.last_email_open_date > '2019-11-08'
Consider whether that should be >= instead of >.
Need an index on shop_id. (Please provide SHOW CREATE TABLE.)
Don't use LEFT JOIN when JOIN would work equally well.
If customer_extra is columns that should have been in customer, now is the time to move them in. That would let you use this composite index for even more performance:
INDEX(shop_id, last_email_open_date) -- in this order
With those changes, a billion rows in MySQL will probably not be a problem. If it is, there are still more fixes I can suggest.
I have a few tables recording trades by myself and trades that happen in general, separated out into tradelog and timesales respectively. Both tradelog and timesales each have a second table holding extra data about the trades. I have a column TimeSalesID in my tradelog so I can link up public trades which I participated in. Below is the query I'm running to try and get ALL of my trades and time and sales in one result. The right join is taking FOREVER though, is there a better way?
SELECT SUM(tsJoin.TradeEdge)
FROM
(SELECT * from tradelog tl JOIN slippage_processed sp ON tl.ID = sp.TradeLogID WHERE tl.TradeTIme > '2019-01-21') AS tlJoin
RIGHT JOIN
(SELECT * from timesales ts JOIN slippage_processed_timesales spt ON ts.ID = spt.TimeSalesID WHERE ts.TradeTime > '2019-01-21') AS tsJoin
ON tlJoin.TIMESalesID = tsJoin.ID
Not using subqueries would help. MySQL tends to materialize subqueries, which means that indexes are lost -- greatly impeding query plans.
I would start with:
select sum(?.TradeEdge) -- whatever table column it comes from
from timesales ts join
slippage_processed_timesales spt
on ts.ID = spt.TimeSalesID left join
tradelog tl
on ?.TIMESalesID = ?.ID left join
slippage_processed sp
on tl.ID = sp.TradeLogID and tl.TradeTIme > '2019-01-21'
where ts.TradeTime > '2019-01-21';
The ? are because I don't know the base tables where the columns are coming from. Depending on the tables, the query might need to be adjusted a bit.
Also, I don't think the outer joins are necessary for what you want to do.
You should simplify your sql to simple JOINs. Based on your query above, I think this might get you a result a little faster
SELECT SUM(TradeEdge)
FROM tradelog tl
INNER JOIN slippage_processed sp ON tl.ID = sp.TradeLogID
INNER JOIN timesales ts ON ts.ID = sp.TimeSalesID
WHERE tl.TradeTIme > '2019-01-21'
If this does not work, please post your table structure for all tables involved
For these tables here (circled)
http://imgur.com/Q1HlJ
What would the best join be to use for them , I tried using OUTER join but it would not return any rows even if there was matching data.
Thanks
QUERY
select *
from
hr
OUTER JOIN
hr
ON
hr.procedure_id = procedure.procedure_id
OUTER JOIN
staff
ON
staff.staff_id = hr.staff_id
Where hr.procedure_id = procedure.procedure_id
I would suggest reading up on JOINs. It is hard to know what you are looking for. We won't make that decision for you.
What you probably want is this:
select p.*, hr.*, s.*
from procedure p
left outer join hr on p.procedure_id = hr.procedure_id
left outer join staff s on hr.staff_id = s.staff_id
That will give you results unless there are no rows in procedure.
There are different ways that you could put these together, though, depending on what exactly you want. I don't know what "hr" stands for here, so I can't use common sense to figure that out.
Don't bother with JOIN at all, and let the query optimizer handle it for you.
select * from hr, procedure, staff
where hr.procedure_id = procedure.procedure_id and staff.staff_id = hr.staff_id
hi I am doing A query to get some product info, but there is something strange going on, the first query returns resultset fast (.1272s) but the second (note that I just added 1 column) takes forever to complete (28-35s), anyone know what is happening?
query 1
SELECT
p.partnumberp,
p.model,
p.descriptionsmall,
p.brandname,
sum(remainderint) stockint
from
inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid= ind.inventoryinid)
group by partnumberp, projectid
query 2
SELECT
p.partnumberp,
p.model,
p.descriptionsmall,
p.brandname,
p.descriptiondetail,
sum(remainderint) stockint
from
inventario_dbo.inventoryindetails inda
left join purchaseorders.product p on (p.partnumberp = inda.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid= inda.inventoryinid)
group by partnumberp, projectid
You shouldn't group by some columns and then select other columns unless you use aggregate functions. Only p.partnumberp and sum(remainderint) make sense here. You're doing a huge join and select and then the results for most rows just end up getting discarded.
You can make the query much faster by doing an inner select first and then joining that to the remaining tables to get your final result for the last few columns.
The inner select should look something like this:
select p.partnumberp, projectid, sum(remainderint) stockint
from inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid = ind.inventoryinid)
group by partnumberp, projectid
After the join:
select T1.partnumberp, T1.projectid, p2.model, p2.descriptionsmall, p2.brandname, T1.stockint
from
(select p.partnumberp, projectid, sum(remainderint) stockint
from inventario_dbo.inventoryindetails ind
left join purchaseorders.product p on (p.partnumberp = ind.partnumberp)
left join inventario_dbo.inventoryin ins on (ins.inventoryinid = ind.inventoryinid)
group by partnumberp, projectid) T1
left join purchaseorders.product p2 on (p2.partnumberp = T1.partnumberp)
Is descriptiondetail a really large column? Sounds like it could be a lot of text compared to the other fields based on its name, so maybe it just takes a lot more time to read from disk, but if you could post the schema detail for the purchaseorders.product table or maybe the average length of that column that would help.
Otherswise I would try running the query a few times and see you consistently get the same time results. Could just be load on the database server the time you got the slower result.