Optimizing MySQL Query With MAX() - mysql

Apologies if this has been asked before but is there any way, at all, I can optimize this query to run faster. At the minute it takes about 2 seconds which while isn't a huge amount it is the slowest query on my site, all other queries take less that 0.5 secs.
Here is my query:
SELECT SQL_CALC_FOUND_ROWS MAX(images.id) AS maxID, celebrity.* FROM images
JOIN celebrity ON images.celeb_id = celebrity.id
GROUP BY images.celeb_id
ORDER BY maxID DESC
LIMIT 0,20
Here is an explain:
1 SIMPLE celebrity ALL PRIMARY NULL NULL NULL 536 Using temporary; Using filesort
1 SIMPLE images ref celeb_id celeb_id 4 celeborama_ignite.celebrity.id 191
I'm at a loss at how to improve the performance in this query further. I'm not super familiar with MySQL, but I do know that it is slow because I am sorting on the data created by MAX() and that has no index. I can't not sort on that as it gives me the results needed, but is there something else I can do to prevent it from slowing down the query?
Thanks.

If you really need fast solution - then don't perform such queries in runtime.
Just create additional field last_image_id in celebrity table and update it on event of uploading of new image (by trigger or your application logic, doesn't matter)

I would get the latest image this way:
SElECT c.*, i.id AS image_id
FROM celebrity c
JOIN images i ON i.celeb_id = c.id
LEFT OUTER JOIN images i2 ON i2.celeb_id = c.id AND i2.id > i.id
WHERE i2.id IS NULL
ORDER BY image_id DESC
LIMIT 0,20;
In other words, try to find a row i2 for the same celebrity with a higher id than i.id. If the outer join fails to find that match, then i.id must be the max image id for the given celebrity.
SQL_CALC_FOUND_ROWS can cause queries to run extremely slowly. I've found some cases where just removing the SQL_CALC_FOUND_ROWS made the query run 200x faster (but it could also make only a small difference in other cases, it depends on the table, so you should test both ways).
If you need the equivalent of SQL_CALC_FOUND_ROWS, just run a separate query:
SELECT COUNT(*) FROM celebrity;

I think you need a compound index on (celeb_id, id) in table images (supposing it's a MyISAM table), so the GROUP BY celeb_id and MAX(id) can use this index.
But with big tables, you'll probably have to follow #zerkms' advice and add a new column in table celebrity

MYSQL doesn't perform so good with joins. i would recommend to dividing your query in two. that is in first query select the Celeb and then select image. Simply avoid joins.
Check out this link - http://phpadvent.org/2011/a-stitch-in-time-saves-nine-by-paul-jones

SELECT STRAIGHT_JOIN *
FROM (
SELECT MAX(id) as maxID, celeb_id as id
FROM images
GROUP BY celeb_id
ORDER by maxID DESC
LIMIT 0, 20) as ids
JOIN celebrity USING (id);
the query does not allow row number precalculation, but an additional:
SELECT COUNT(DISTINCT celeb_id)
FROM images;
or even (if each celebrity has an image):
SELECT COUNT(*) FROM celebrity;
will not cost much, because can easily be cached by the query cache (if it not switched off).

Related

Mysql Select INNER JOIN with order by very slow

I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.

How to make an efficient Count of rows from another table (join)

I have a database of 100,000 names in cemeteries. The cemeteries number around 6000....i wish to return the number of names in each cemetery..
If i do an individual query, it takes a millisecond
SELECT COUNT(*) FROM tblnames
WHERE tblcemetery_ID = 2
My actual query goes on and on and I end up killing it so I dont kill our database. Can someone point me at a more efficient method?
select tblcemetery.id,
(SELECT COUNT(*) FROM tblnames
WHERE tblcemetery_ID = tblcemetery.id) AS casualtyCount
from tblcemetery
ORDER BY
fldcemetery
You can rephrase your query to use a join instead of a correlated subquery:
SELECT
t1.id,
COUNT(t2.tblcemetery_ID) AS casualtyCount
FROM tblcemetery t1
LEFT JOIN tblnames t2
ON t1.id = t2.tblcemetery_ID
GROUP BY
t1.id
ORDER BY
t1.id
I have heard that in certain databases, such as Oracle, the optimizer is smart enough to figure out what I wrote above, and would refactor your query under the hood. But the MySQL optimizer might not be smart enough to do this.
One nice side effect of this refactor is that we now see an opportunity to improve performance even more, by adding indices to the join columns. I am assuming that id is the primary key of tblcemetery, in which case it is already indexed. But you could add an index to tblcemetery_ID in the tblnames table for a possible performance boost:
CREATE INDEX cmtry_idx ON tblnames (tblcemetery_ID)
It could even be done without a JOIN by using an EXISTS clause like this
SELECT id, COUNT(*) AS casualtyCount
FROM tblcemetery
WHERE EXISTS (SELECT 1 FROM tblnames WHERE tblcemetery_ID=id)
GROUP BY id
ORDER BY id
Or you could look up group by here f.e. and do something like
SELECT tblcemetery_ID, sum(1) from tblnames group by tblscemetery_id
You essentially sum up 1 for each name entry that belongs to this cemetery as you are not interessted in the names at all, no need to join to the cemetary detail table
Not sure if sum(1) or count(*) is better, both should work.
Youll only get cemetaries that have ppl inside though

MySQL Query with left join, group by and order by extremely slow

I have the following query, which takes around 15 seconds to execute. If I remove the ORDER BY, it takes 3 seconds, which is still way too long.
SELECT
pages.id AS id,
pages.page_title AS name,
SUM(visitors.bounce) AS bounce,
SUM(visitors.goal) AS goal,
count(visitors.id) AS volume
FROM
pages
LEFT JOIN visitors ON pages.id = visitors.page_id
GROUP BY pages.id
ORDER BY volume DESC
For readability, I slightly simplified this query from the one used in the application, but I've been testing with this simplified query and the problem does still exists. So the problem is in this part.
Table pages: around 3K records. Table visitors: around 300K records.
What I have done:
I have indexes on visitors.page_id (with external key linking to pages.id).
Obviously my ID fields are set as primary key.
What I have tried:
I have increased the read_buffer_size, sort_buffer_size, read_rnd_buffer_size, to 64M.
EXPLAIN query with sorting (15 secs):
EXPLAIN query without sorting (3 secs, still way to long and that's not the output I want):
Remove the SUM and Count calculations, they didn't really have an effect on the execution time.
Any ideas to improve this query?
First, try
My first suggestion is to do the aggregation before the join:
SELECT p.id, p.page_title AS name,
v.bounce, v.goal,v.volume
FROM pages p LEFT JOIN
(SELECT page_id, sum(v.bounce) as bounce, sum(v.goal) as goal,
count(*) as volumn
FROM visitors v
GROUP BY page_id
) v
ON pages.id = v.page_id
ORDER BY volume DESC;
However, your query needs to do both an aggregation and a sort -- and you have no filtering. I'm not sure you'll be able to get it much faster.

MySQL: Grouped/Ordered/Left Join query very slow

I have a problem with a query which takes far too long (Over two seconds just for this simple query).
On first look it appears to be an indexing issue, all joined fields are indexed, but i cannot find what else I may need to index to speed this up. As soon as i add the fields i need to the query, it gets even slower.
SELECT `jobs`.`job_id` AS `job_id` FROM tabledef_Jobs AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Table row counts (~): tabledef_Jobs (108k), tabledef_JobCatLink (109k), tabledef_Companies (100), tabledef_Applications (50k)
Here you can see the Describe. 'Using temporary' appears to be what is slowing down the query:
table index screenshots:
Any help would be greatly appreciated
EDIT WITH ANSWER
Final improved query with thanks to #Steve (marked answer). Ultimately, the final query was reduced from ~22s to ~0.3s:
SELECT `jobs`.`job_id` AS `job_id` FROM
(
SELECT * FROM tabledef_Jobs as jobs ORDER BY `jobs`.`date_posted` ASC LIMIT 0 , 50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Right, I’ll have a stab at this.
It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.
You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.
I would suggest, adding a composite index to jobs of “job_id, date_posted”
So firstly optimise the base query:
SELECT * FROM tabledef_Jobs
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50
Then you can combine the joins and the final structure together to make a more efficient query.
I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use
LIMIT 3000, 50
This will then collect 3050 rows / manipulate the data and then throw away the first 3000.
[edit 1 - In response to comments below]
I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.
Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.
Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.
Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.
How about as I suggested creating an index on date_posted and then using:
SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
SELECT DISTINCT job_id FROM tabledef_Jobs
ORDER BY date_posted
LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id

More Efficient Way To Write MySQL Query?

My site has suddenly started spitting out the following error:
"Incorrect key file for table '/tmp/#sql_645a_1.MYI'; try to repair it"
I remove it, the site works fine.
My server tech support guys suggest I clean up the query and make it more efficient.
Here's the query:
SELECT *, FROM_UNIXTIME(post_time, '%Y-%c-%d %H:%i') as posttime
FROM communityposts, communitytopics, communityusers
WHERE communityposts.poster_id=communityusers.user_id
AND communityposts.topic_id=communitytopics.topic_id
ORDER BY post_time DESC LIMIT 5
Any help is greatly appreciated. Perhaps can be done with a JOIN?
Many thanks,
Scott
UPDATE: Here's the working query, I still feel it could be optimised though.
SELECT
communityposts.post_id, communityposts.topic_id, communityposts.post_time,
communityusers.user_id, , communitytopics.topic_title, communityusers.username,
communityusers.user_avatar,
FROM_UNIXTIME(post_time, '%Y-%c-%d %H:%i') as post time
FROM
communityposts,
communitytopics,
communityusers
WHERE
communityposts.poster_id=communityusers.user_id
AND communityposts.topic_id=communitytopics.topic_id
ORDER BY post_time DESC LIMIT 5
SELECT
A.*,B.*,C.*,FROM_UNIXTIME(post_time, '%Y-%c-%d %H:%i') as posttime
FROM
(
SELECT id,poster_id,topic_id
FROM communityposts
ORDER BY post_time DESC
LIMIT 5
) cpk
INNER JOIN communityposts A USING (id)
INNER JOIN communityusers B ON cpk.poster_id=B.user_id
INNER JOIN communitytopics C USING (topic_id);
If a community post does not have to have a user and a topic, then use LEFT JOINs for the last two joins.
You will need to create a supporting index for the cpk subquqery:
ALTER TABLE communityposts ADD INDEX (posttime,id,poster_id,topic_id);
This query has to be the fastest because the cpk subquery only gets five keys ALL THE TIME.
UPDATE 2011-10-10 16:28 EDT
This query eliminiates the ambiguous topic_id issue:
SELECT
A.post_id, cpk.topic_id, A.post_time,
B.user_id, C.topic_title, B.username,
B.user_avatar,
FROM_UNIXTIME(post_time, '%Y-%c-%d %H:%i') as posttime
FROM
(
SELECT id,poster_id,topic_id
FROM communityposts
ORDER BY post_time DESC
LIMIT 5
) cpk
INNER JOIN communityposts A USING (id)
INNER JOIN communityusers B ON cpk.poster_id=B.user_id
INNER JOIN communitytopics C ON cpk.topic_id=C.topic_id;
Then temp table used for sorting the data probably gets too big. I have seen this happen when the /tmp/ runs out of space. The LIMIT clause does not make it any quicker or easier, as the sorting of the full data set has to be done first.
Under some conditions, MySQL does not use a temp table to sort data. You can read about it here: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
If you manage to meet the right conditions (mostly use the correct indexes), it will also peed up your query.
If this doesn't help (in some cases you can't escape the heavy sorting), try to find out how much free space there is on /tmp/, and see if it can be expanded. Also, as sehe mentioned, selecting only the needed columns (instead of *) can make the temp table smaller and is considered best practice anyway (and so is using explicit JOINs instead of implicit ones).
You could reduce the number of fields selected.
The * operator will select all fields from all (3) tables. This may get big. That said,
I think mysql is smart enough to lay this plan out so that it doesn't need to access the data pages except for the 5 rows being selected.
Are you sure that all the involved (foreign) keys are indexed?
Here's my stab:
SELECT posts.*, FROM_UNIXTIME(post_time, '%Y-%c-%d %H:%i') as posttime
FROM communityposts posts
INNER JOIN communitytopics topics ON posts.topic_id = topics.topic_id
INNER JOIN communityusers users ON posts.poster_id = users.user_id
ORDER BY post_time DESC LIMIT 5