Very slow Left Join on small/medium tables - mysql

I have an issue with a particular left join slowing down an important query drastically. Using PHPMyAdmin to test the query, it states that the query took 3.9 seconds to return 546 rows, though I had to wait 67 seconds to see the results which struck me as odd.
There are two tables involved:
nsi (1,553 rows) and files (233,561 rows). All columns mentioned in this query are indexed individually as well as there being a compound index on filejobid, category and isactive in the files table. Everything being compared is an integer as well.
The goal of this watered down version of the query is to display the row from the nsi table once and be able to determine if someone has uploaded a file in category 20 or not. There can be multiple files but should only be one row, hence the grouping.
The query:
SELECT
nsi.id AS id,
f.id AS filein
FROM nsi nsi
LEFT JOIN files f
ON f.filejobid=nsi.leadid AND f.category=20 AND f.isactive=1
WHERE nsi.isactive=1
GROUP BY nsi.id
The 67 second load time for this data is simply unacceptable for my application and I'm at a loss as to how to optimize it any further. I've indexed and compound indexed. Should I just be looking into a new more roundabout solution instead?
Thank you for any help!

This is your query, which I find a bit suspicious, because you have an aggregation but not aggregation function on f.id. It will return an arbitrary matching id:
SELECT nsi.id AS id, f.id AS filein
FROM nsi LEFT JOIN
files f
ON f.filejobid = nsi.leadid AND f.category = 20 AND f.isactive = 1
WHERE nsi.isactive = 1
GROUP BY nsi.id;
For this query, I think the best indexes are files(filejobid), category, isactive) and nsi(isactive, filejobid, id).
However you can easily rewrite the query to be more efficient, because it doesn't need the group by (assuming nsi.id is unique):
SELECT nsi.id AS id,
(SELECT f.id
FROM files f
WHERE f.filejobid = nsi.leadid AND f.category = 20 AND f.isactive = 1
LIMIT 1
) AS filein
FROM nsi
WHERE nsi.isactive = 1;
The same indexes would work for this.
If you want a list of matching files, rather than just one, then use group_concat() in either query.

Related

Cross-Apply bad for a larger database or alternatives perform better?

so 2 (more so 3) questions, is my query just badly coded or thought out ? (be kind, I only just discovered cross apply and relatively new) and is corss-apply even the best sort of join to be using or why is it slow?
So I have a database table (test_tble) of around 66 million records. I then have a ##Temp_tble created which has one column called Ordr_nbr (nchar (13)). This is basically ones I wish to find.
The test_tble has 4 columns (Ordr_nbr, destination, shelf_no, dte_bought).
This is my current query which works the exact way I want it to but it seems to be quite slow performance.
select ##Temp_tble.Ordr_nbr, test_table1.destination, test_table1.shelf_no,test_table1.dte_bought
from ##MyTempTable
cross apply(
select top 1 test_table.destination,Test_Table.shelf_no,Test_Table.dte_bought
from Test_Table
where ##MyTempTable.Order_nbr = Test_Table.order_nbr
order by dte_bought desc)test_table1
If the ##Temp_tble only has 17 orders to search for it take around 2 mins. As you can see I'm trying to get just the most recent dte_bought or to some max(dte_bought) of each order.
In term of index I ran database engine tuner and it says its optimized for the query and I have all relative indexes created such as clustered index on test_tble for dte_bought desc including order_nbr etc.
The execution plan is using a index scan(on non_clustered) and a key lookup(on clustered).
My end result is it to return all the order_nbrs in ##MyTempTble along with columns of destination, shelf_no, dte_bought in relation to that order_nbr, but only the most recent bought ones.
Sorry if I explained this awfully, any info needed that I can provide just ask. I'm not asking for just downright "give me code", more of guidance,advice and learning. Thank you in advance.
UPDATE
I have now tried a sort of left join, it works reasonably quicker but still not instant or very fast (about 30 seconds) and it also doesn't return just the most recent dte_bought, any ideas? see below for left join code.
select a.Order_Nbr,b.Destination,b.LnePos,b.Dte_bought
from ##MyTempTble a
left join Test_Table b
on a.Order_Nbr = b.Order_Nbr
where b.Destination is not null
UPDATE 2
Attempted another let join with a max dte_bought, works very but only returns the order_nbr, the other columns are NULL. Any suggestion?
select a.Order_nbr,b.Destination,b.Shelf_no,b.Dte_Bought
from ##MyTempTable a
left join
(select * from Test_Table where Dte_bought = (
select max(dte_bought) from Test_Table)
)b on b.Order_nbr = a.Order_nbr
order by Dte_bought asc
K.M
Instead of CROSS APPLY() you can use INNER JOIN with subquery. Check the following query :
SELECT
TempT.Ordr_nbr
,TestT.destination
,TestT.shelf_no
,TestT.dte_bought
FROM ##MyTempTable TempT
INNER JOIN (
SELECT T.destination
,T.shelf_no
,T.dte_bought
,ROW_NUMBER() OVER(PARTITION BY T.Order_nbr ORDER BY T.dte_bought DESC) ID
FROM Test_Table T
) TestT
ON TestT.Id=1 AND TempT.Order_nbr = TestT.order_nbr

MySQL Slow performance with count() of matching records in a joined table

There is a table called "basket_status" in the query below. For each record in basket_status, a count of yarn balls in the basket is being made from another table (yarn_ball_updates).
The basket_status table has 761 rows. The yarn_ball_updates table has 1,204,294 records. Running the query below takes about 30 seconds to 60 seconds (depending on how busy the server is) and returns 750 rows. Obviously my problem is doing a match against 1,204,294 records for all of the 761 basket_status records.
I tried making a view based on the query but offered no performance increase. I believe I read that for views you can't have sub queries and complex joins.
What direction should I take to speed up this query? I've never made a MySQL scheduled task or anything, but it seems like the "basket_status" table should have a "yarn_ball_count" count already in it, and an automated process should be updating that new extra count() column maybe?
Thanks for any help or direction.
SELECT p.id, p.basket_name, p.high_quality, p.yarn_ball_count
FROM (
SELECT q.id, q.basket_name, q.high_quality,
CAST(SUM(IF (q.report_date = mxd.mxdate,1,0)) AS CHAR) yarn_ball_count
FROM (
SELECT bs.id, bs.basket_name, bs.high_quality,ybu.report_date
FROM yb.basket_status bs
JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
) q,
(SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates ybu) mxd
GROUP BY q.basket_name, q.high_quality ) p
I don't think you need nested queries for this. I'm not a MySQL developer but won't this work?
SELECT bs.id, bs.basket_name, bs.high_quality, count(*) yarn_ball_count
FROM yb.basket_status bs
JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
JOIN (SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates) mxd ON ybu.report_date = mxd.mxdate
GROUP BY bs.basket_name, bs.high_quality

Optimizing MySQL query with subselect

I am trying to make the following query run faster than 180 secs:
SELECT
x.di_on_g AS deviceid, SUM(1) AS amount
FROM
(SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN ("many (~1500) comma separated IDs coming from my code"))
GROUP BY g.device_id , g.guide_type_id) x
GROUP BY x.di_on_g
ORDER BY amount;
Screenshot from EXPLAIN:
https://ibb.co/da5oAF
Even if I run the subquery as separate query it is still very slow...:
SELECT
g.device_id AS di_on_g
FROM
guide g
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
WHERE
g.operator_id IN (1 , 1)
AND g.locale_id = 1
AND (g.device_id IN (("many (~1500) comma separated IDs coming from my code")
Screenshot from EXPLAIN:
ibb.co/gJHRVF
I have indexes on g.device_id and on other appropriate places.
Indexes:
SHOW INDEX FROM guide;
ibb.co/eVgmVF
SHOW INDEX FROM operator_guide_type;
ibb.co/f0TTcv
SHOW INDEX FROM operator_device;
ibb.co/mseqqF
I tried creating a new temp table for the ids and using a JOIN to replace the slow IN clause but that didn't make the query much faster.
All IDs are Integers and I tried creating a new temp table for the ids that come from my code and JOIN that table instead of the slow IN clause but that didn't make the query much faster. (10 secs faster)
None of the tables have more then 300,000 rows and the mysql configuration is good.
And the visual plan:
Query Plan
Any help will be appreciated !
Let's focus on the subquery. The main problem is "inflate-deflate", but I will get to that in a moment.
Add the composite index:
INDEX(locale_id, operator_id, device_id)
Why the duplicated "1" in
g.operator_id IN (1 , 1)
Why does the GROUP BY have 2 columns, when you select only 1? Is there some reason for using GROUP BY instead of DISTINCT. (The latter seems to be your intent.)
The only reason for these
INNER JOIN operator_guide_type ogt ON ogt.guide_type_id = g.guide_type_id
INNER JOIN operator_device od ON od.device_id = g.device_id
would be to verify that there are guides and devices in those other table. Is that correct? Are these the PRIMARY KEYs, hence unique?: ogt.guide_type_id and od.device_id. If so, why do you need the GROUP BY? Based on the EXPLAIN, it sounds like both of those are related 1:many. So...
SELECT g.device_id AS di_on_g
FROM guide g
WHERE EXISTS( SELECT * FROM operator_guide_type WHERE guide_type_id = g.guide_type_id )
AND EXISTS( SELECT * FROM operator_device WHERE device_id = g.device_id
AND g.operator_id IN (1)
AND g.locale_id = 1
AND g.device_id IN (...)
Notes:
The GROUP BY is no longer needed.
The "inflate-deflate" of JOIN + GROUP BY is gone. The Explain points this out -- 139K rows inflated to 61M -- very costly.
EXISTS is a "semijoin", meaning that it does not collect all matches, but stops when it finds any match.
"the mysql configuration is good" -- How much RAM do you have? What Engine is the table? What is the value of innodb_buffer_pool_size?

Multiple left joins and performance

I have following tables:
products - 4500 records
Fields: id, sku, name, alias, price, special_price, quantity, desc, photo, manufacturer_id, model_id, hits, publishing
products_attribute_rel - 35000 records
Fields: id, product_id, attribute_id, attribute_val_id
attribute_values - 243 records
Fields: id, attr_id, value, ordering
manufacturers - 29 records
Fields: id, title,publishing
models - 946 records
Fields: id, manufacturer_id, title, publishing
So I get data from these tables by one query:
SELECT jp.*,
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
FROM `products` AS jp
LEFT JOIN `products_attribute_rel` AS jpar ON jpar.product_id = jp.id
LEFT JOIN `attribute_values` AS jav ON jav.attr_id = jpar.attribute_val_id
LEFT JOIN `manufacturers` AS jm ON jm.id = jp.manufacturer_id
LEFT JOIN `models` AS jmo ON jmo.id = jp.model_id
GROUP BY jp.id HAVING COUNT(DISTINCT jpar.attribute_val_id) >= 0
This query is slow as hell. It takes hundreds of seconds mysql to handle it.
So how it would be possible to improve this query ? With small data chunks it works
perfectly well. But I guess everything ruins products_attribute_rel table, which
has 35000 records.
Your help would be appreciated.
EDITED
EXPLAIN results of the SELECT query:
The problem is that MySQL uses the join-type ALL for 3 tables. That means that MySQL performs 3 full table scans, puts every possibility together before sorting those out that don't match the ON statement. To get a much faster join-type (for instance eq_ref), you must put an index on the coloumns that are used on the ON statements.
Be aware though that putting an index on every possible coloumn is not recommended. A lot of indexes do speed up SELECT statements, however it also creates an overhead since the index must be stored and managed. This means that manipulation queries like UPDATE and DELETE are much slower. I've seen queries deleting only 1000 records in half an hour. It's a trade-off where you have to decide what happens more often and what is more important.
To get more infos on MySQL join-types, take a look at this.
More on indexes here.
Tables data is not so much huge that it's taking hundreds of seconds. Something is wrong with table schema. Please do proper indexing. That will surly speed up.
select distinct
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
from products jp,
products_attribute_rel jpar,
attribute_values jav,
manufacturers jm
models jmo
where jpar.product_id = jp.id
and jav.attr_id = jpar.attribute_val_id
and jm.id = jp.manufacturer_id
and jmo.id = jp.model_id
you can do that if you want to select all the data. Hope it works.

SQL Query displaying duplicate results suddenly

I am stumped after a day of troubleshooting. This MySQL query has been running successfully for a month and all of the sudden I am receiving duplicate results which is fouling up the entire management program it is relevant to.
My troubleshooting included searching the database for duplicate entries and none existed. For example, I used the search command in phpMyAdmin to display the entries where entry_id = 45 and field_id = 65. The result of the search displayed only one result which is correct. As of yesterday the below query displays the same result twice.
Query:
SELECT f.id, f.title, f.type, f.name, v.id AS f_id, v.field_value
FROM jos_directory_enf AS v
LEFT JOIN jos_directory_field AS f ON f.id = v.field_id
WHERE v.entry_id = 45 AND v.field_id = 65
You may want to see if your joining tables have any duplicates in them that will result in more results to your main query.
It is difficult to answer your question without knowing details about your schema and the duplicated result. I guess your data was updated since the time where no duplicated were presented.
Look if v.id and v.field_value contain NULL values more than once for the same v.field_id or there rows with the same values for the columns, which you project in one of the two tables. Look if f.id or v.field_id have the same values in different rows.