MySQL indexes on BINARY(16). What size? - mysql

I got a table with over 6.6 millions rows.
I got a field, named trip_id who's in BINARY(16). I find my query too slow (0.2 seconds). This query run near once every 3 seconds.
Before doing anything stupid, I want to know if I lower the index size on trip_id from full to 12, would it make a difference ?
If I try to tweak my query more, would it make a difference ?
Thanks
EDIT:
Query :
SELECT stop_times.stop_id
FROM trips
LEFT JOIN stop_times ON trips.trip_id = stop_times.trip_id
WHERE trips.route_id = '141'
GROUP BY stop_times.stop_id
ORDER BY trips.trip_headsign ASC,
stop_times.stop_sequence ASC
trip_id BINARY(16)
route_id SMALLINT(3)
trip_headsign VARCHAR(50)
stop_sequence SMALLINT(3)
Explain of the query :

After doing researches, I've found the problem because yes, 0.2 seconds is slow.
SELECT t.trip_headsign, st.stop_sequence, s.stop_code, s.stop_name
FROM stop_times AS st
JOIN stops AS s USING (stop_id)
JOIN ( SELECT trip_id,
route_id,
trip_headsign
FROM trips
WHERE route_id = '141'
LIMIT 2
) AS t
WHERE t.trip_id = st.trip_id
GROUP BY st.stop_id
First, instead of doing a LEFT JOIN, JOIN is faster here. But the important point, I was matching all results from trips in the WHERE statement.
However, since a bus can only have 2 directions, I only have to limit my results to 2. Now, my results are near 0.018. Over 1000% improvement.

You've got 'Using temporary' and 'Using filesort' in your 'Extra' column.
These are surefire signs that you could improve things. The reason that these are showing up is because of your GROUP and ORDER clauses.
First step: are they truly necessary? You may find that, end to end, it's cheaper to sort them with the language that consumes this data.
Second step: if you still need ORDER BY, then take a look at ORDER BY Optimization in the MySQL docs. The reason that an index is not used for sorting here is the differing GROUP BY and ORDER BY clauses.
Think outside of the box. You're not doing any aggregation, so maybe grouping isn't necessary. Maybe just pull all of the rows and then ignore the duplicated ids.

Try adding trip_headsign to your "route" index. Because you are using that in the ORDER BY, mysql needs to go to the actual table to fetch it for every record it finds in the index that matches the route_id. If you don't see "Using index" in the Extra column of the explain, that means MySQL is forced to go back to the actual table to get additional information.

Related

Mysql Select INNER JOIN with order by very slow

I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.

MySQL: Grouped/Ordered/Left Join query very slow

I have a problem with a query which takes far too long (Over two seconds just for this simple query).
On first look it appears to be an indexing issue, all joined fields are indexed, but i cannot find what else I may need to index to speed this up. As soon as i add the fields i need to the query, it gets even slower.
SELECT `jobs`.`job_id` AS `job_id` FROM tabledef_Jobs AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Table row counts (~): tabledef_Jobs (108k), tabledef_JobCatLink (109k), tabledef_Companies (100), tabledef_Applications (50k)
Here you can see the Describe. 'Using temporary' appears to be what is slowing down the query:
table index screenshots:
Any help would be greatly appreciated
EDIT WITH ANSWER
Final improved query with thanks to #Steve (marked answer). Ultimately, the final query was reduced from ~22s to ~0.3s:
SELECT `jobs`.`job_id` AS `job_id` FROM
(
SELECT * FROM tabledef_Jobs as jobs ORDER BY `jobs`.`date_posted` ASC LIMIT 0 , 50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Right, I’ll have a stab at this.
It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.
You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.
I would suggest, adding a composite index to jobs of “job_id, date_posted”
So firstly optimise the base query:
SELECT * FROM tabledef_Jobs
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50
Then you can combine the joins and the final structure together to make a more efficient query.
I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use
LIMIT 3000, 50
This will then collect 3050 rows / manipulate the data and then throw away the first 3000.
[edit 1 - In response to comments below]
I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.
Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.
Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.
Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.
How about as I suggested creating an index on date_posted and then using:
SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
SELECT DISTINCT job_id FROM tabledef_Jobs
ORDER BY date_posted
LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id

Optimizing MySQL Query With MAX()

Apologies if this has been asked before but is there any way, at all, I can optimize this query to run faster. At the minute it takes about 2 seconds which while isn't a huge amount it is the slowest query on my site, all other queries take less that 0.5 secs.
Here is my query:
SELECT SQL_CALC_FOUND_ROWS MAX(images.id) AS maxID, celebrity.* FROM images
JOIN celebrity ON images.celeb_id = celebrity.id
GROUP BY images.celeb_id
ORDER BY maxID DESC
LIMIT 0,20
Here is an explain:
1 SIMPLE celebrity ALL PRIMARY NULL NULL NULL 536 Using temporary; Using filesort
1 SIMPLE images ref celeb_id celeb_id 4 celeborama_ignite.celebrity.id 191
I'm at a loss at how to improve the performance in this query further. I'm not super familiar with MySQL, but I do know that it is slow because I am sorting on the data created by MAX() and that has no index. I can't not sort on that as it gives me the results needed, but is there something else I can do to prevent it from slowing down the query?
Thanks.
If you really need fast solution - then don't perform such queries in runtime.
Just create additional field last_image_id in celebrity table and update it on event of uploading of new image (by trigger or your application logic, doesn't matter)
I would get the latest image this way:
SElECT c.*, i.id AS image_id
FROM celebrity c
JOIN images i ON i.celeb_id = c.id
LEFT OUTER JOIN images i2 ON i2.celeb_id = c.id AND i2.id > i.id
WHERE i2.id IS NULL
ORDER BY image_id DESC
LIMIT 0,20;
In other words, try to find a row i2 for the same celebrity with a higher id than i.id. If the outer join fails to find that match, then i.id must be the max image id for the given celebrity.
SQL_CALC_FOUND_ROWS can cause queries to run extremely slowly. I've found some cases where just removing the SQL_CALC_FOUND_ROWS made the query run 200x faster (but it could also make only a small difference in other cases, it depends on the table, so you should test both ways).
If you need the equivalent of SQL_CALC_FOUND_ROWS, just run a separate query:
SELECT COUNT(*) FROM celebrity;
I think you need a compound index on (celeb_id, id) in table images (supposing it's a MyISAM table), so the GROUP BY celeb_id and MAX(id) can use this index.
But with big tables, you'll probably have to follow #zerkms' advice and add a new column in table celebrity
MYSQL doesn't perform so good with joins. i would recommend to dividing your query in two. that is in first query select the Celeb and then select image. Simply avoid joins.
Check out this link - http://phpadvent.org/2011/a-stitch-in-time-saves-nine-by-paul-jones
SELECT STRAIGHT_JOIN *
FROM (
SELECT MAX(id) as maxID, celeb_id as id
FROM images
GROUP BY celeb_id
ORDER by maxID DESC
LIMIT 0, 20) as ids
JOIN celebrity USING (id);
the query does not allow row number precalculation, but an additional:
SELECT COUNT(DISTINCT celeb_id)
FROM images;
or even (if each celebrity has an image):
SELECT COUNT(*) FROM celebrity;
will not cost much, because can easily be cached by the query cache (if it not switched off).

MySQL performance, inner join, how to avoid Using temporary and filesort

I have a table 1 and table 2.
Table 1
PARTNUM - ID_BRAND
partnum is the primary key
id_brand is "indexed"
Table 2
ID_BRAND - BRAND_NAME
id_brand is the primary key
brand_name is "indexed"
The table 1 contains 1 million of records and the table 2 contains 1.000 records.
I'm trying to optimize some query using EXPLAIN and after a lot of try I have reached a dead end.
EXPLAIN
SELECT pm.partnum, pb.brand_name
FROM products_main AS pm
LEFT JOIN products_brands AS pb ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
The query returns this execution plan:
ID, SELECT_TYPE, TABLE, TYPE, POSSIBLE_KEYS, KEY, KEY_LEN , REF, ROWS, EXTRA
1, SIMPLE, pm, range, PRIMARY, PRIMARY, 1, , 1000000, Using where; Using temporary; Using filesort
1, SIMPLE, pb, ref, PRIMARY, PRIMARY, 4, demo.pm.id_pbrand, 1,
The MySQL query optimizer shows a temporary + filesort in the execution plan.
How can I avoid this?
The "EVIL" is in the ORDER BY pb.brand ASC. Ordering by that external field seems to be the bottleneck..
First of all, I question the use of an outer join seeing as the order by is operating on the rhs, and the NULL's injected by the left join are likely to play havoc with it.
Regardless, the simplest approach to speeding up this query would be a covering index on pb.id_brand and pb.brand. This will allow the order by to be evaluated 'using index' with the join condition. The alternative is to find some way to reduce the size of the intermediate result passed to the order-by.
Still, the combination of outer-join, order-by, and limit, leaves me wondering what exactly you are querying for, and if there might not be a better way of expressing the query itself.
Try replacing the join with a subquery. MySQL's optimizer kind of sucks; subqueries often give better performance than joins.
First, try changing your index on the products_brands table. Delete the existing one on brand_name, and create a new one:
ALTER TABLE products_brands ADD INDEX newIdx (brand_name, id_brand)
Then, the table will already have a "orderedByBrandName" index with the ids you need for the join, and you can try:
EXPLAIN
SELECT pb.brand_name, pm.partnum
FROM products_brands AS pb
LEFT JOIN products_main AS pm ON pb.id_brand = pm.id_brand
LIMIT 0, 10
Note that I also changed the order of the tables in the query, so you start with the small one.
This question is somewhat outdated, but I did find it, and so will other people.
Mysql uses temporary if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
So you just need to have the join order reversed by using STRAIGHT_JOIN, to bypass the order invented by optimizer:
SELECT STRAIGHT_JOIN pm.partnum, pb.brand_name
FROM products_brands AS pb
RIGHT JOIN products_main AS pm ON pm.id_brand=pb.id_brand
ORDER BY pb.brand ASC
LIMIT 0, 10
Also make sure that max_heap_table_size AND tmp_table_size variables are set to a number big enough to store the results:
SET global tmp_table_size=100000000;
SET global max_heap_table_size=100000000;
-- 100 megabytes in this example. These can be set in my.cnf config file, too.

What's wrong with this query? EXPLAIN looks fine to me

I'm going through an application and trying to optimize some queries and I'm really struggling with a few of them. Here's an example:
SELECT `Item` . * , `Source` . * , `Keyword` . * , `Author` . *
FROM `items` AS `Item`
JOIN `sources` AS `Source` ON ( `Item`.`source_id` = `Source`.`id` )
JOIN `authors` AS `Author` ON ( `Item`.`author_id` = `Author`.`id` )
JOIN `items_keywords` AS `ItemsKeyword` ON ( `Item`.`id` = `ItemsKeyword`.`item_id` )
JOIN `keywords` AS `Keyword` ON ( `Keyword`.`id` = `ItemsKeyword`.`keyword_id` )
JOIN `keywords_profiles` AS `KeywordsProfile` ON ( `Keyword`.`id` = `KeywordsProfile`.`keyword_id` )
JOIN `profiles` AS `Profile` ON ( `Profile`.`id` = `KeywordsProfile`.`profile_id` )
WHERE `KeywordsProfile`.`profile_id` IN ( 17 )
GROUP BY `Item`.`id`
ORDER BY `Item`.`timestamp` DESC , `Item`.`id` DESC
LIMIT 0 , 20;
This one is taking 10-30 seconds...in the tables referenced, there are about 500k author rows, and about 750k items and items_keywords rows. Everything else is less than 500 rows.
Here's the explain output:
http://img.skitch.com/20090220-fb52wd7jf58x41ikfxaws96xjn.jpg
EXPLAIN is relatively new to me, but I went through this line by line and it all seems fine. Not sure what else I can do, as I've got indexes on everything...what am I missing?
The server this sits on is just a 256 slice over at slicehost, but there's nothing else running on it and the CPU is at 0% before its run. And yet still it cranks away on this query. Any ideas?
EDIT: Some further info; one of the things that makes this really frustrating is that if I repeatedly run this query, it takes less than .1 seconds. I'm assuming this is due to the query cache, but if I run RESET QUERY CACHE before it, it still runs extremely quickly. It's only after I wait a little while or run some other queries that the 10-30 second times return. All the tables are MyISAM...does this indicate that MySQL is loading stuff into memory and that's why it runs so much faster for awhile?
EDIT 2: Thanks so much to everyone for your help...an update...I cut everything down to this:
SELECT i.id
FROM items AS i
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;
Consistently took 5-6 seconds, despite there only being 750k records in the DB. Once I dropped the 2nd column on the ORDER BY clause, it was pretty much instant. There's obviously several things going on here, but when I cut the query down to this:
SELECT i.id
FROM items AS i
JOIN items_keywords AS ik ON ( i.id = ik.item_id )
JOIN keywords AS k ON ( k.id = ik.keyword_id )
JOIN keywords_profiles AS kp ON ( k.id = kp.keyword_id )
WHERE kp.profile_id IN (139)
ORDER BY i.timestamp DESC
LIMIT 20;
It's still taking 10+ seconds...what else can I do?
Minor curiosity: on the explain, the rows column for items_keywords is always 1544, regardless of what profile_id I'm using in the query. shouldn't it change depending on the number of items associated with that profile?
EDIT 3: Ok, this is getting ridiculous :). If I drop the ORDER BY clause entirely, things are very speedy and the temp table / filesort disappears from explain. There's currently an index on the item.timestamp column, but is it not being used for some reason? I thought I remembered something about mysql only using one index per table or something? should I create a multi-column index over all the columns on the items table that this query references (source_id, author_id, timestamp, etc)?
Try this and see how it does:
SELECT i.*, s.*, k.*, a.*
FROM items AS i
JOIN sources AS s ON (i.source_id = s.id)
JOIN authors AS a ON (i.author_id = a.id)
JOIN items_keywords AS ik ON (i.id = ik.item_id)
JOIN keywords AS k ON (k.id = ik.keyword_id)
WHERE k.id IN (SELECT kp.keyword_id
FROM keywords_profiles AS kp
WHERE kp.profile_id IN (17))
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;
I factored out a couple of the joins into a non-correlated subquery, so you wouldn't have to do a GROUP BY to map the result to distinct rows.
Actually, you may still get multiple rows per i.id in my example, depending on how many keywords map to a given item and also to profile_id 17.
The filesort reported in your EXPLAIN report is probably due to the combination of GROUP BY and ORDER BY using different fields.
I agree with #ʞɔıu's answer that the speedup is probably because of key caching.
It looks okay, every row in the explain is using an index of some sort. One possible worry is the filesort bit. Try running the query without the order by clause and see if that improves it.
Then, what I would do is gradually take out each join until you (hopefully) get that massive speed increase, then concentrate on why that's happening.
The reason I mention the filesort is because I can't see a mention of timestamp anywhere in the explain output (even though it's your primary sort criteria) - it might be requiring a full non-indexed sort.
UPDATE#1:
Based on edit#2, the query:
SELECT i.id
FROM items AS i
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;
takes 5-6 seconds. That's abhorrent. Try creating a composite index on both TIMESTAMP and ID and see if that improves it:
create index timestamp_id on items(timestamp,id);
select id from items order by timestamp desc,id desc limit 0,20;
select id from items order by timestamp,id limit 0,20;
select id from items order by timestamp desc,id desc;
select id from items order by timestamp,id;
On one of the tests, I've left off the descending bit (DB2 for one sometimes doesn't use indexes if they're in the opposite order). The other variation is to take off the limit in case that's affecting it.
For your query to run fast, you need:
Create an index: CREATE INDEX ix_timestamp_id ON items (timestamp, id)
Ensure that id's on sources, authors and keywords are primary keys.
Force MySQL to use this index for items, and perform NESTED LOOP joins for other items:
EXPLAIN EXTENDED
SELECT Item.*, Source . * , Keyword . * , Author . *
FROM items AS Item FORCE INDEX FOR ORDER BY (ix_timestamp_id)
JOIN items_keywords AS ItemsKeyword FORCE INDEX (ix_item_keyword) ON ( Item.id = ItemsKeyword.item_id AND ItemsKeyword.keyword_id IN
(
SELECT keyword_id
FROM keywords_profiles AS KeywordsProfile FORCE INDEX (ix_keyword_profile)
WHERE KeywordsProfile.profile_id = 17
)
)
JOIN sources AS Source FORCE INDEX (primary) ON ( Item.source_id = Source.id )
JOIN authors AS Author FORCE INDEX (primary) ON ( Item.author_id = Author.id )
JOIN keywords AS Keyword FORCE INDEX (primary) ON ( Keyword.id = ItemsKeyword.keyword_id )
ORDER BY Item.timestamp DESC, Item.id DESC
LIMIT 0, 20
As you can see, we get rid of GROUP BY, push the subquery into the JOIN condition and force PRIMARY KEYs to be used for joins.
That's how we ensure that NESTED LOOPS with items as a leading tables will be used for all joins.
As a result:
1, 'PRIMARY', 'Item', 'index', '', 'ix_timestamp_id', '12', '', 20, 2622845.00, ''
1, 'PRIMARY', 'Author', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.author_id', 1, 100.00, ''
1, 'PRIMARY', 'Source', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.source_id', 1, 100.00, ''
1, 'PRIMARY', 'ItemsKeyword', 'ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.id', 1, 100.00, 'Using where; Using index'
1, 'PRIMARY', 'Keyword', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.ItemsKeyword.keyword_id', 1, 100.00, ''
2, 'DEPENDENT SUBQUERY', 'KeywordsProfile', 'unique_subquery', 'PRIMARY', 'PRIMARY', '8', 'func,const', 1, 100.00, 'Using index; Using where'
, and when we run this, we get
20 rows fetched in 0,0038s (0,0019s)
There are 500k in items, 600k in items_keywords, 512 values in keywords and 512 values in keywords_profiles (all with profile 17).
I would suggest you run a profiler on the query, then you can see how long each subquery took and where the time is being consumed. If you have phpmyadmin, it's a simple chekbox you need to check to get this functionality, but my guess is you can get it manually from the mysql terminal app as well. I haven't seen this explain thing before, if it is in fact the profiling i am used to in phpmyadmin i apologize for the nonsense.
What is the GROUP BY clause achieving? There are no aggregate functions in the SELECT so the GROUP BY should be unnecessary
Some things to try:
Try not selecting all columns from all tables, and select only what you need. That may preclude the use of covering indexes (looking for using index in the extra column) and in general will soak up a lot of needless IO.
That filesort looks a little troubling. Try removing the order by and replacing it with order by null -- group by implicitly sorts in mysql so you have to order by null to remove that implicit sort.
Try adding an index on item (timestamp, id) or (id, timestamp). Might do something about that filesort (you never know).
Why are you grouping by item id? and not selecting any aggregate columns? if you group by a column and then select (much less order by) some other non-aggregate columns then the values of those columns will be selected more or less arbitrary. Unless, is item id is always unique for this query, in which case the group by will not accomplish anything.
Lastly, in my experience, mysql sometimes will just inexplicably freak out if you give it too many joins to try to optimize. Try and figure out if there's some way you don't have to do so many joins all once like that, i.e. split it up into multiple queries if you can.
one of the things that makes this really frustrating is that if I repeatedly run this query, it takes less than .1 seconds. I'm assuming this is due to the query cache — add SQL_NO_CACHE after the SELECT keyword to disable the use of the query cache per this query
All the tables are MyISAM...does this indicate that MySQL is loading stuff into memory and that's why it runs so much faster for awhile — MyISAM uses a key buffer and only caches index data in memory, and relies on the OS to hopefully cache non-index data. Unlike Innodb, which caches everything in the buffer pool.
Is it possible you're having issues because of filesystem I/O ? The EXPLAIN shows that there have to be 1544 rows fetched from the ItemsKeyword table. If you have to go to disk for each of those you'll add about 10-15 second total to the run time (assuming a high-ish seek time because you're on a VM). Normally the tables are cached in RAM or the data is stored close enough on the disk that reads can be combined. However, you're running on a VM with 256MB of ram, so you may no memory spare it can cache into and if your table file is fragmented enough you might be able to get the query performance degraded this much.
You could probably get some idea of what's happening with I/O during the query by running something like pidstat -d 1 or iostat 1 in another shell on the server.
EDIT:
From looking at the query adding an index on (ItemsKeyword.item_id, ItemsKeyword.keyword_id) should fix it if my theory is right about it being a problem with the seeks for the ItemsKeyword table.
MySQL loads a lot into different caches, including indexes and queries. In addition, your operating system will keep a file system cache that could speed up your query when executed repeatedly.
One thing to consider is how MySQL creates temporary tables during this type of query. As you can see in your explain, a temporary table is being created, probably for sorting of the results. Generally, MySQL will create these temporary tables in memory, except for 2 conditions. First, if they exceed the maximum size set in MySQL settings (max temp table size or heap size - check mysqlperformanceblogs.com for more info on these settings). The second and more important one is this:
Temporary tables will always be created on disk when text or blob tables are selected in the query.
This can create a major performance hit, and even lead to an i/o bottleneck if your server is getting any amount of action.
Check to see if any of your columns are of this data type. If they are, you can try to rewrite the query so that a temporary table is not created (group by always causes them, I think), or try not selecting these out. Another strategy would be to break this up into several smaller queries that might execute in a fraction of the time.
Good luck!
I may be completely wrong but what happens when you change
WHERE kp.profile_id IN (139)
to
WHERE kp.profile_id = 139
Try this:
SELECT i.id
FROM ((items AS i
INNER JOIN items_keywords AS ik ON ( i.id = ik.item_id ))
INNER JOIN keywords AS k ON ( k.id = ik.keyword_id ))
INNER JOIN keywords_profiles AS kp ON ( k.id = kp.keyword_id AND kp.profile_id = 139)
ORDER BY i.timestamp DESC
LIMIT 20;
Looking at the pastie.org link in the comments to the question:
you're joining items.source_id int(4) to sources.id int(16)
also items.id int(16) to itemskeywords.item_id int(11)
I can't see any good reason for the two fields to have different widths in these cases
I realise that these are just display widths and that the actual range of numbers which the column can store is determined solely by the INT part but the MySQL 6.0 reference manual says:
Note that if you store larger values
than the display width in an integer
column, you may experience problems
when MySQL generates temporary tables
for some complicated joins, because in
these cases MySQL assumes that the
data fits into the original column
width.
From the rough figures you quoted, it doesn't look as though you are exceeding the display width on any of the ID columns. You may as well tidy up these inconsistencies though just to eliminate another possible bug.
You might be as well to remove the display widths altogether if you don't have a need for them
edit:
I would hazard a guess that the original author of the database perhaps thought that int(4) meant "an integer with up to 4 digits" whereas it actually means "an integer between -2147483648 and 2147482647 displayed with at least 4 characters left-padded with spaces if need be"
Definitions like authors.refreshed int(20) or items.timestamp int(30) don't really make sense as there can only be 10 digits plus the sign in an int. Even a bigint can't exceed 20 characters. Perhaps the original author thought that int(4) was analogous to varchar(4)?
Try a backup copy of your tables. After that rename the original tables to something else, rename the new tables to the original and try again with your new-but-old-named tables...
Or you can try to repair the tables, but this doesn't always help.
Edit: Man, this was an old question...
The problem appears that it has to full joins across every single table before it even tries to do a where clause. This can cause 500k rows per table across you're looking in the millions+ rows that it's populating in memory. I would try changing the JOINS to LEFT JOINS USING ().