So, I have these 2 tables. One contains, lets call them, bugs, and another contains solutions. One bug may have more than one solution (or state, if you may), and I'd like to get the last one.
Right now, I'm doing it like this:
Select b.*, ap.approveMistakeId
from bugs_table as b
LEFT JOIN (select approveId, approveCause, approveDate, approveInfo,
approveUsername, mistakeId as approveMistakeId
from approve_table
order by approveDate desc) AS ap
ON m.mistakeId = ap.approveMistakeId
GROUP BY b.bugId
This is not a complete query, it is just to show the how I'm approaching it. The real query joins more than 10 tables.
The problem with this is, that with this subselect, the query runs for about 3.3s and returns ~2.4K records. Without this query, it runs for 0.4s, which is more acceptable. I have created and index on approveDate, but that didn't seem to solve the problem.
Could the solution be a view, that was created from this query, and then joined to this query? Or is there a way to do this in some other manner?
Thanks!
For joining two tables, you essentially should have some join condition. If I take m.mistakeId = ap.approveMistakeId to be b.mistakeId = ap.approveMistakeId, then also you can go for a simpler join like:
Select b.*, ap.approveMistakeId, ap.approveDate
from bugs_table as b
INNER JOIN approve_table as ap ON b.mistakeId = ap.approveMistakeId
order by ap.approveDate desc
The problem was the sub-select. The solution - de-normalization. Yes, I know, that isn't the best solution, but definitely the fastest.
Tried aggregate self-sorting, but that only added time. Same with joining with a view.
Related
I'm working through the JOIN tutorial on SQL zoo.
Let's say I'm about to execute the code below:
SELECT a.stadium, COUNT(g.matchid)
FROM game a
JOIN goal g
ON g.matchid = a.id
GROUP BY a.stadium
As it happens, it produces the same output as the code below:
SELECT a.stadium, COUNT(g.matchid)
FROM goal g
JOIN game a
ON g.matchid = a.id
GROUP BY a.stadium
So then, when does it matter which table you assign at FROM and which one you assign at JOIN?
When you are using an INNER JOIN like you are here, the order doesn't matter. That is because you are connecting two tables on a common index, so the order in which you use them is up to you. You should pick an order that is most logical to you, and easiest to read. A habit of mine is to put the table I'm selecting from first. In your case, you're selecting information about a stadium, which comes from the game table, so my preference would be to put that first.
In other joins, however, such as LEFT OUTER JOIN and RIGHT OUTER JOIN the order will matter. That is because these joins will select all rows from one table. Consider for example I have a table for Students and a table for Projects. They can exist independently, some students may have an associated project, but not all will.
If I want to get all students and project information while still seeing students without projects, I need a LEFT JOIN:
SELECT s.name, p.project
FROM student s
LEFT JOIN project p ON p.student_id = s.id;
Note here, that the LEFT JOIN refers to the table in the FROM clause, so that means ALL of students were being selected. This also means that p.project will be null for some rows. Order matters here.
If I took the same concept with a RIGHT JOIN, it will select all rows from the table in the join clause. So if I changed the query to this:
SELECT s.name, p.project
FROM student s
RIGHT JOIN project p ON p.student_id = s.id;
This will return all rows from the project table, regardless of whether or not it has a match for students. This means that in some rows, s.name will be null. Similar to the first example, because I've made project the outer joined table, p.project will never be null (assuming it isn't in the original table). In the first example, s.name should never be null.
In the case of outer joins, order will matter. Thankfully, you can think intuitively with LEFT and RIGHT joins. A left join will return all rows in the table to the left of that statement, while a right join returns all rows from the right of that statement. Take this as a rule of thumb, but be careful. You might want to develop a pattern to be consistent with yourself, as I mentioned earlier, so these queries are easier for you to understand later on.
When you only JOIN 2 tables, usually the order does not matter: MySQL scans the tables in the optimal order.
When you scan more than 2 tables, the order could matter:
SELECT ...
FROM a
JOIN b ON ...
JOIN c ON ...
Also, MySQL tries to scan the tables in the fastest way (large tables first). But if a join is slow, it is possible that MySQL is scanning them in a non-optimal order. You can verify this with EXPLAIN. In this case, you can force the join order by adding the STRAIGHT_JOIN keyword.
The order doesn't always matter, I usually just order it in a way that makes sense to someone reading your query.
Sometime order does matter. Try it with LEFT JOIN and RIGHT JOIN.
In this instance you are using an INNER JOIN, if you're expecting a match on a common ID or foreign key, it probably doesn't matter too much.
You would however need to specify the tables the correct way round if you were performing an OUTER JOIN, as not all records in this type of join are guaranteed to match via the same field.
yes, it will matter when you will user another join LEFT JOIN, RIGHT JOIN
currently You are using NATURAL JOIN that is return all tables related data, if JOIN table row not match then it will exclude row from result
If you use LEFT / RIGHT {OUTER} join then result will be different, follow this link for more detail
this is my query from my source code
SELECT `truyen`.*, MAX(chapter.chapter) AS last_chapter
FROM (`truyen`)
LEFT JOIN `chapter` ON `chapter`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE \'%%\'
GROUP BY `truyen`.`Id`
LIMIT 250
When I install it on iFastnet host, It cause over 500,000 rows to be examined due to the join, and the query is being blocked (this would used over 100% of a CPU, which ultimately would cause server instability).
I also tried to add this line before the query, it fixed the problem above but lead to another issue making some of functions can not run correctly
mysql_query("SET SQL_BIG_SELECTS=1");
How can I fix this problem without buying another hosting ?
Thanks.
You might be looking for an INNER JOIN. That would remove results that do not match. I find INNER JOINs to be faster than LEFT JOINs.
However, I'm not sure what results you are actually looking for. But because you are using the GROUP BY, it looks like the INNER JOIN might work for you.
One thing I would recommend is copy and paste the query that it generates into SQL with DESCRIBE before it.
So if the query ended up being:
SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
You would type:
DESCRIBE SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
This will tell you if you could possibly ad an index to your table to JOIN on faster.
I hope this at least points you in the right direction.
Michael Berkowski seems to agree with the indexing, which you will be able to see from the DESCRIBE.
Please look if you have indexes on chapter.chapter and chapter.truyen. If not, set them and try again. If this is not successful try these suggestions:
Do you have the possibility to flag permanently on insert/update your last chapter in a column of your chapter table? Then you could use it to reduce the joined rows and you could drop out the GROUP BY. Maybe in this way:
SELECT `truyen`.*, `chapter`.`chapter` as `last_chapter`
FROM `truyen`, `chapter`
WHERE `chapter`.`truyen` = `truyen`.`Id`
AND `chapter`.`flag_last_chapter` = 1
AND `truyen`.`title` LIKE '%queryString%'
LIMIT 250
Or create a new table for that instead:
INSERT INTO new_table (truyen, last_chapter)
SELECT truyen, MAX(chapter) FROM chapter GROUP BY truyen;
SELECT `truyen`.*, `new_table`.`last_chapter`
FROM (`truyen`)
LEFT JOIN `new_table` ON `new_table`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE '%queryString%'
GROUP BY `truyen`.`Id`
LIMIT 250
Otherwise you could just fetch the 250 rows of truyen, collect your truyen ids in an array and build another SQL Statement to select the 250 rows of the chapter table. I have seen in your original question that you can use PHP for that. So you could merge the results after that:
SELECT * FROM truyen
WHERE title LIKE '%queryString%'
LIMIT 250
SELECT truyen, MAX(chapter) AS last_chapter
FROM chapter
WHERE truyen in (comma_separated_ids_from_first_select)
I did not write this query. I am working on someone else's old code. I am looking into changing what is needed for this query but if I could simply speed up this query that would solve my problem temporarily. I am looking at adding indexes. when I did a show indexes there are so many indexes on the table orders can that also slow down a query?
I am no database expert. I guess I will learn more from this effort. :)
SELECT
orders.ORD_ID,
orders.ORD_TotalAmt,
orders.PAYMETH_ID,
orders.SCHOOL_ID,
orders.ORD_AddedOn,
orders.AMAZON_PurchaseDate,
orders.ORDSTATUS_ID,
orders.ORD_InvoiceNumber,
orders.ORD_CustFirstName,
orders.ORD_CustLastName,
orders.AMAZON_ORD_ID,
orders.ORD_TrackingNumber,
orders.ORD_SHIPPINGCNTRY_ID,
orders.AMAZON_IsExpedited,
orders.ORD_ShippingStreet1,
orders.ORD_ShippingStreet2,
orders.ORD_ShippingCity,
orders.ORD_ShippingStateProv,
orders.ORD_ShippingZipPostalCode,
orders.CUST_ID,
orders.ORD_ShippingName,
orders.AMAZON_ShipOption,
orders.ORD_ShipLabelGenOn,
orders.ORD_SHIPLABELGEN,
orders.ORD_AddressVerified,
orders.ORD_IsResidential,
orderstatuses.ORDSTATUS_Name,
paymentmethods.PAYMETH_Name,
shippingoptions.SHIPOPT_Name,
SUM(orderitems.ORDITEM_Qty) AS ORD_ItemCnt,
SUM(orderitems.ORDITEM_Weight * orderitems.ORDITEM_Qty) AS ORD_ItemTotalWeight
FROM
orders
LEFT JOIN orderstatuses ON
orders.ORDSTATUS_ID = orderstatuses.ORDSTATUS_ID
LEFT JOIN orderitems ON
orders.ORD_ID = orderitems.ORD_ID
LEFT JOIN paymentmethods ON
orders.PAYMETH_ID = paymentmethods.PAYMETH_ID
LEFT JOIN shippingoptions ON
orders.SHIPOPT_ID = shippingoptions.SHIPOPT_ID
WHERE
(orders.AMAZON_ORD_ID IS NOT NULL AND (orders.ORD_SHIPLABELGEN IS NULL OR orders.ORD_SHIPLABELGEN = '') AND orderstatuses.ORDSTATUS_ID <> 101 AND orderstatuses.ORDSTATUS_ID <> 40)
GROUP BY
orders.ORD_ID,
orders.ORD_TotalAmt,
orders.PAYMETH_ID,
orders.SCHOOL_ID,
orders.ORD_AddedOn,
orders.ORDSTATUS_ID,
orders.ORD_InvoiceNumber,
orders.ORD_CustFirstName,
orders.ORD_CustLastName,
orderstatuses.ORDSTATUS_Name,
paymentmethods.PAYMETH_Name,
shippingoptions.SHIPOPT_Name
ORDER BY
orders.ORD_ID
One simple thing you should consider is whether you really need to use left joins or you would be satisfied using inner joins for some of the joins. the new query would not be the same as the original query, so you would need to think carefully about what you really want back. If your foreign key relationships are indexed correctly, this could help substantially, especially between ORDERS and ORDERITEMS, because I would imagine these are your largest tables. The following post has a good explanation: INNER JOIN vs LEFT JOIN performance in SQL Server. There are lots of other things that can be done, but you will need to post the query plan so people can dive deeper.
It looks like just adding the index was all that was needed.
create index orderitems_ORD_ID_index on orderitems(ORD_ID);
Ok, here we go. There's this messy SELECT crossing other tables and ordering to get the one desired row. Basically I do the "math" inside the ORDER BY.
1 base table.
7 JOINS poiting to local tables.
WHERE with 2 clauses and a NOT IN crossing another table.
You'll see in the code the ORDER BY is pretty damn big/ugly, it sums the result of 5 different calculations. I need that result to order by those calculations in order to get the worst row-case.
The problem is once I execute the Stored Procedure it takes up to 8 seconds to run. That's kind of non-acceptable. So, I'm starting to check Indexes.
So, I'm looking for advices on how to make this query run faster.
I'm indexing the WHERE clauses and the field LINEA, Should I index something else? Like the rows Im crossing for the JOINs? or should I approach the query differently?
Query:
SET #LINEA = (
SELECT TOP 1
BOA.LIN
FROM
BAND_BA BOA
LEFT JOIN
TEL PAR
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(PAR.Te,2,10)
LEFT JOIN
TELP CLP
ON REPLACE(BOA.Lin,'-','') = SUBSTRING(CLP.Numtel,2,10)
LEFT JOIN
CA C
ON REPLACE(BOA.Lin,'-','') = C.An
LEFT JOIN
RE R
ON REPLACE(BOA.Lin,'-','') = R.Lin
LEFT JOIN
PRODUCTOS2 P2
ON BOA.PRODUCTO = P2.codigo
LEFT JOIN
EN
ON REPLACE(BOA.Lin,'-','') = EN.G
LEFT JOIN
TIP ID
ON TIPID = ID.ID
WHERE
BOA.EST = 'C' AND
ID.SE = 'boA' AND
BOA.LIN NOT IN (
SELECT
LIN
FROM
BAN
)
ORDER BY (EN.VALUE + ANT.VALUE + REIT.VAL + C.VALUE + TEL.VALUE
) DESC,
I'll be frank, this is some pretty terrible SQL. Without seeing all your table structures, advice here will be incomplete. That being said, please don't post all your table structures because you are already very close to "hire a consultant" territory with this.
All the REPLACE logic should be done away with. If you need to JOIN on these fields, then add comparable fields to the tables so you don't need to manipulate the data. Every single JOIN that uses a REPLACE or SUBSTRING is a table or index scan - those are non-SARGable and a definite anti-pattern.
The ORDER BY is probably the most convoluted ORDER BY I have ever seen. Some major issues there:
Subqueries should all be eliminated and materialized either in the outer query or as variables
String manipulation should be eliminated (see item 1 above)
The entire query is basically a code smell. If you need to write code like this to meet business requirements then you either have a terribly inappropriate design or some other much larger issue in the organization or data.
One thing that can kill performance is using a lot of LEFT JOINs. To improve performance of LEFT JOIN, you might want to make sure that the column(s) to which you join have an index - that can have a huge impact on performance.
So I've asked a couple of questions about performing joins and have had great answers, but there's still something I'm completely stumped by.
I have 3 tables. Let us call them table-b, table-d and table-e.
Table-b and table-d share a column called p-id.
Table-e and table-b share a column called ev-id.
Table-e also has a column called date.
Table-b also has a unique id column called u-id.
I'd like to write a query which returns u-id under the following conditions:
1) Restriced to a certain value in table-e.date.
2) Where table-b.p-id does not match table-d.p-id.
I think I need to inner join table-b and and table-e on the e-id column. I then think I need to perform a left join on table-d and and table-b where p-id is null.
My problem is that I don't know the syntax of writing this query. I know how to write multiple inner joins and I know how to write a left join. How do I combine the two?
Thanks so much to everyone who is helping me out. I'm (obviously!) a newbie to databases and am struggling to get my head around it all!
You just write the joins one after the other:
SELECT b.uid
FROM b
INNER JOIN e USING(evid)
LEFT JOIN d USING(pid)
WHERE e.date = :whatever
AND d.pid IS NULL
I think it's something like this:
SELECT uid
FROM table-b
INNER JOIN table-e
ON table-b.ev_id = table-e.ev_id
WHERE table-b.p_id NOT IN (SELECT p_id from table-d)