I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.
Related
I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.
You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.
To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.
The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).
As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.
I have a table with a lot of rows, and it's very inefficient to do subqueries on. I can't wrap my head around how to do a join on the data to save time.
Here is what I have:
http://sqlfiddle.com/#!2/6ab0c/3/0
This is a bit long for a comment.
First, I think you are missing an ORDER BY in the subquery. I suspect you want order by I2.date to get the "next" row.
Second, MySQL doesn't quite offer the functionality you need. You could rewrite the query using variables. But, because you don't describe what it is doing, it is hard to be sure that a rewrite would be correct. That is one way to speed the query.
Third, this query would be much faster -- and probably fast enough -- with an index on items(location, sku, date). That index is probably all you need.
SELECT I1.*, MIN(I2.exit_date)
FROM Items I1
LEFT JOIN (
SELECT date as exit_date, location, sku
FROM Items
ORDER BY date asc
) as I2
ON I2.exit_date > I1.date
AND I2.location = I1.location
AND I2.sku = I1.sku
GROUP BY I1.id
I have this simple join that works great but is HORRIBLY slow I think because the tech table is very large. There are many instances of uid as it tracks timestamp of the uid thus the distinct. What is the best way to speed this query up?
SELECT DISTINCT tech.uid,
listing.empno,
listing.firstname,
listing.lastname
FROM tech,
listing
WHERE tech.uid = listing.empno
ORDER BY listing.empno ASC
First add an Index to tech.UID and listing.EmpNo on their respective tables.
After you are sure there are indexes you can try to re-write your query like this:
SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
If it's still not fast enough, put the word EXPLAIN before the query to get some hints about the execution plan of the query.
EXPLAIN SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
Posts the Explain results so we can get better insight.
Hope it helps,
This is very simple query. Only thing you can do in SQL - you may add indexes on fields used in JOIN/WHERE and ORDER BY clauses (tech.uid, listing.empno), if there are no indexes.
If there are JOIN fields with NULL values - they may ruin your performance. You should filter them in WHERE clause (WHERE tech.uid is not null and listing.empno not null). If there are many rows with JOIN on NULL field - that data may produce cartesian result (not sure how is this called in english) with may contain enormous count of rows.
You may change MySQL configuration. There are many options useful for performance tuning, like key_buffer_size, sort_buffer_size, tmp_table_size, max_heap_table_size, read_buffer_size etc.
I have a problem with a query which takes far too long (Over two seconds just for this simple query).
On first look it appears to be an indexing issue, all joined fields are indexed, but i cannot find what else I may need to index to speed this up. As soon as i add the fields i need to the query, it gets even slower.
SELECT `jobs`.`job_id` AS `job_id` FROM tabledef_Jobs AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Table row counts (~): tabledef_Jobs (108k), tabledef_JobCatLink (109k), tabledef_Companies (100), tabledef_Applications (50k)
Here you can see the Describe. 'Using temporary' appears to be what is slowing down the query:
table index screenshots:
Any help would be greatly appreciated
EDIT WITH ANSWER
Final improved query with thanks to #Steve (marked answer). Ultimately, the final query was reduced from ~22s to ~0.3s:
SELECT `jobs`.`job_id` AS `job_id` FROM
(
SELECT * FROM tabledef_Jobs as jobs ORDER BY `jobs`.`date_posted` ASC LIMIT 0 , 50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Right, I’ll have a stab at this.
It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.
You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.
I would suggest, adding a composite index to jobs of “job_id, date_posted”
So firstly optimise the base query:
SELECT * FROM tabledef_Jobs
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50
Then you can combine the joins and the final structure together to make a more efficient query.
I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use
LIMIT 3000, 50
This will then collect 3050 rows / manipulate the data and then throw away the first 3000.
[edit 1 - In response to comments below]
I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.
Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.
Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.
Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.
How about as I suggested creating an index on date_posted and then using:
SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
SELECT DISTINCT job_id FROM tabledef_Jobs
ORDER BY date_posted
LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
My attempt was to join customer and order table and to join the lineitem and order table. I have also indexed the c_mktsegment field. My resultant query is this. Is there anything that I can do do improve it?
select
o_shippriority,
l_orderkey,
o_orderdate,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
cust As c
join ord As o on c.c_custkey = o.o_custkey
join line As l on o.o_orderkey = l.l_orderkey
where
c_mktsegment = ':1'
and o_orderdate < date ':2'
and l_shipdate > date ':2'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate;
I don't see anything obviously wrong with this query. For good performance, you probably should have indexes on orders.o_custkey and lineitem.l_orderkey. The index on c_mktsegment will let the DB find customer records quickly, but from there you need to be able to find order and lineitem records.
You should do an Explain to see how the db is processing the query. This depends on many factors, including the number of records in each table and distribution of keys, so I can't say what the plan is just by looking at the query. But if you run Explain and see that it is doing a full-file read of a table, you should add an index to prevent that. That's pretty much rule #1 for query optimization.