MySQL JOIN time reduction - mysql

This query is taking over a minute to complete:
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
Every keyword has an ID associated with it (keyword_id column). And that ID is used to look up the actual keyword from the keyword table.
movie_keyword has 2.8 million rows
keyword has 127,000
However to return just the most used keyword_id's takes only 1 second:
SELECT keyword_id, count(*)
FROM movie_keyword
GROUP BY keyword_id
ORDER BY count(*) DESC
LIMIT 5
Is there a more efficient way of doing this?
Output with EXPLAIN:
1 SIMPLE keyword ALL PRIMARY NULL NULL NULL 125405 Using temporary; Using filesort
1 SIMPLE movie_keyword ref idx_keywordid idx_keywordid 4 imdb.keyword.id 28 Using index
Structure:
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_mid` (`movie_id`),
KEY `idx_keywordid` (`keyword_id`),
KEY `keyword_ix` (`keyword_id`),
CONSTRAINT `movie_keyword_keyword_id_exists` FOREIGN KEY (`keyword_id`) REFERENCES `keyword` (`id`),
CONSTRAINT `movie_keyword_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4256379 DEFAULT CHARSET=latin1;
CREATE TABLE `keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword` text NOT NULL,
`phonetic_code` varchar(5) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_keyword` (`keyword`(5)),
KEY `idx_pcode` (`phonetic_code`),
KEY `keyword_ix` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127044 DEFAULT CHARSET=latin1;

Untested but should work and be significantly faster in my opinion, not very sure if you're allowed to use limit in a subquery in mysql though, but there are other ways around that.
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
)
GROUP BY keyword
ORDER BY count(*) DESC;
This should be faster because you don't join all the 2.8 million entries in movie_keyword with keyword, just the ones that actually match, which I'm guessing are significantly less.
EDIT since mysql doesn't support limit inside a subquery you have to run
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5;
first and after fetching the results run the second query
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS)
GROUP BY keyword
ORDER BY count(*) DESC;
replace RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS with the proper values programatically from whatever language you're using

The query seems fine but I think the structure is not, try to give index on columns
keyword.id
try,
CREATE INDEX keyword_ix ON keyword (id);
or
ALTER TABLE keyword ADD INDEX keyword_ix (id);
much better if you can post the structures of your tables: keyword and Movie_keyword. Which of the two is the main table and the referencing table?
SELECT keyword, count(movie_keyword.id) as 'Number of Occurences'
FROM movie_keyword
INNER JOIN keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY 'Number of Occurences' DESC
LIMIT 5

I know this is pretty old question, but because I think that xception forgot about delivery tables in mysql, I want to suggest another solution. It requires only one query and it omits joining big data. If someone has such big data and can test it ( maybe question creator ), please share results.
SELECT keyword.keyword, _temp.occurences
FROM (
SELECT keyword_id, COUNT( keyword_id ) AS occurences
FROM movie_keyword
GROUP BY keyword_id
ORDER BY occurences DESC
LIMIT 5
) AS _temp
JOIN keyword ON _temp.keyword_id = keyword.id
ORDER BY _temp.occurences DESC

Related

MySQL - how to optimize query with order by

I am trying to generate a list of the 5 most recent history items for for a collection of user tasks. If I remove the order by the execution drops from ~2 seconds to < 20msec.
Indexes are on
h.task_id
h.mod_date
i.task_id
i.user_id
This is the query
SELECT h.*
, i.task_id
, i.user_id
, i.name
, i.completed
FROM h
, i
WHERE i.task_id = h.task_id
AND i.user_id = 42
ORDER
BY h.mod_date DESC
LIMIT 5
Here is the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE i ref PRIMARY,UserID UserID 4 const 3091 Using temporary; Using filesort
1 SIMPLE h ref TaskID TaskID 4 myDB.i.task_id 7
Here are the show create tables:
CREATE TABLE `h` (
`history_id` int(6) NOT NULL AUTO_INCREMENT,
`history_code` tinyint(4) NOT NULL DEFAULT '0',
`task_id` int(6) NOT NULL,
`mod_date` datetime NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`history_id`),
KEY `TaskID` (`task_id`),
KEY `historyCode` (`history_code`),
KEY `modDate` (`mod_date`)
) ENGINE=InnoDB AUTO_INCREMENT=185647 DEFAULT CHARSET=latin1
and
CREATE TABLE `i` (
`task_id` int(6) NOT NULL AUTO_INCREMENT,
`user_id` int(6) NOT NULL,
`name` varchar(60) NOT NULL,
`due_date` date DEFAULT NULL,
`create_date` date NOT NULL,
`completed` tinyint(1) NOT NULL DEFAULT '0',
`task_description` blob,
PRIMARY KEY (`task_id`),
KEY `name_2` (`name`),
KEY `UserID` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12085 DEFAULT CHARSET=latin1
INDEX(task_id, mod_date, history_id) -- in this order
Will be "covering" and the columns will be in the optimal order
Also, DROP
KEY `TaskID` (`task_id`)
So that the Optimizer won't be tempted to use it.
Try changing the index on h.task_id so it's this compound index.
CREATE OR REPLACE INDEX TaskID ON h(task_id, mod_date DESC);
This may (or may not) allow MySql to shortcut some or all the extra work in your ORDER BY ... LIMIT ... request. It's a notorious performance anti pattern, by the way, but sometimes necessary.
Edit the index didn't help. So let's try a so-called deferred join so we don't have to ORDER and then LIMIT all the data from your h table.
Start with this subquery. It retrieves only the primary key values for the rows involved in your results, and will generate just five rows.
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
Why this subquery? It handles the work-intensive ORDER BY ... LIMIT operation while manipulating only the primary keys and the date. It still must sort tons of rows only to discard all but five, but the rows it has to handle are much shorter. Because this subquery does the heavy work, you focus on optimizing it, rather than the whole query.
Keep the index I suggested above, because it covers the subquery for h.
Then, join it to the rest of your query like this. That way you'll only have to retrieve the expensive h.description column for the five rows you care about.
SELECT h.* , i.task_id, i.user_id , i.name, i.completed
FROM h
JOIN i ON i.task_id = h.task_id
JOIN (
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
) selected ON h.history_id = selected.history_id
AND i.task_id = selected.task_id
ORDER BY h.mod_date DESC
LIMIT 5

How do I SELECT from a table with a JOIN with multiple matching values?

I have the following simple query that works just fine when there is one keyword to match:
SELECT gc.id, gc.name
FROM gift_card AS gc
JOIN keyword ON gc.id = keyword.gc_id
WHERE keyword = 'mini'
GROUP BY gc.id
ORDER BY id DESC
What I want to do is find the id's that match at least two of the keywords I provide. I thought just adding a simple AND would work but I get blank results.
SELECT gc.id, gc.name
FROM gift_card AS gc
JOIN keyword ON gc.id = keyword.gc_id
WHERE keyword = 'mini'
AND keyword = '2012'
GROUP BY gc.id
ORDER BY id DESC
Obviously SQL is not my strong suit so I am looking for some help one what I am doing wrong here.
Here are my table structures:
CREATE TABLE `gift_card` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=52 DEFAULT CHARSET=utf8;
CREATE TABLE `keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`gc_id` int(11) NOT NULL,
`keyword` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
UNIQUE KEY `dupes_UNIQUE` (`gc_id`,`keyword`)
) ENGINE=InnoDB AUTO_INCREMENT=477 DEFAULT CHARSET=utf8;
No, and does not work. A column cannot have two different values in one row.
Instead, or . . . and a bit more logic:
SELECT gc.id, gc.name
FROM gift_card gc JOIN
keyword k
ON gc.id = k.gc_id
WHERE k.keyword IN ('mini', '2012')
GROUP BY gc.id
HAVING COUNT(*) = 2 -- both match
ORDER BY id DESC;
It is a good idea to qualify all column names in a query that has more than one table reference.

Slow query with multiple where and order by clauses

I'm trying to find a way to speed up a slow (filesort) MySQL query.
Tables:
categories (id, lft, rgt)
questions (id, category_id, created_at, votes_up, votes_down)
Example query:
SELECT * FROM questions q
INNER JOIN categories c ON (c.id = q.category_id)
WHERE c.lft > 1 AND c.rgt < 100
ORDER BY q.created_at DESC, q.votes_up DESC, q.votes_down ASC
LIMIT 4000, 20
If I remove the ORDER BY clause, it's fast. I know MySQL doesn't like both DESC and ASC orders in the same clause, so I tried adding a composite (created_at, votes_up) index to the questions table and removed q.votes_down ASC from the ORDER BY clause. That didn't help and it seems that the WHERE clause gets in the way here because it filters by columns from another (categories) table. However, even if it worked, it wouldn't be quite right since I do need the q.votes_down ASC condition.
What are good strategies to improve performance in this case? I'd rather avoid restructuring the tables, if possible.
EDIT:
CREATE TABLE `categories` (
`id` int(11) NOT NULL auto_increment,
`lft` int(11) NOT NULL,
`rgt` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `lft_idx` (`lft`),
KEY `rgt_idx` (`rgt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `questions` (
`id` int(11) NOT NULL auto_increment,
`category_id` int(11) NOT NULL,
`votes_up` int(11) NOT NULL default '0',
`votes_down` int(11) NOT NULL default '0',
`created_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `questions_FI_1` (`category_id`),
KEY `votes_up_idx` (`votes_up`),
KEY `votes_down_idx` (`votes_down`),
KEY `created_at_idx` (`created_at`),
CONSTRAINT `questions_FK_1` FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE q ALL questions_FI_1 NULL NULL NULL 31774 Using filesort
1 SIMPLE c eq_ref PRIMARY,lft_idx,rgt_idx PRIMARY 4 ttt.q.category_id 1 Using where
Try a subquery to get the desired categories:
SELECT * FROM questions
WHERE category_id IN ( SELECT id FROM categories WHERE lft > 1 AND rgt < 100 )
ORDER BY created_at DESC, votes_up DESC, votes_down ASC
LIMIT 4000, 20
Try selecting only what you need in your query, instead of the SELECT *
Why not to use SELECT * ( ALL ) in MySQL
Try putting conditions, concerning joined tables into ON clauses:
SELECT * FROM questions q
INNER JOIN categories c ON (c.id = q.category_id AND c.lft > 1 AND c.rgt < 100)
ORDER BY q.created_at DESC, q.votes_up DESC, q.votes_down ASC
LIMIT 4000, 20

Optimising MySQL query on JOINed tables with GROUP BY and ORDER BY without using nested queries

This feels like a bit of a beginner SQL question to me, but here goes. This is what I'm trying to do:
join three tables together, products, tags, and a linking table.
aggregate the tags into a single comma delimited field (hence the GROUP_CONCAT and the GROUP BY)
limit the results (to 30)
have the results in order of the 'created' date
avoid using subqueries where possible, as they're particularly unpleasant to code using an Active Record framework
I've described the tables involved at the bottom of this post, but here's the query that I'm performing
SELECT p.*, GROUP_CONCAT(pt.name)
FROM products p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
LIMIT 30;
There are about 280,000 products, 130 tags, 524,000 linking records and I've ANALYZEd the tables. The problem is that it's taking over 80s to run (on decent hardware), which feels wrong to me.
Here's the EXPLAIN results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index NULL created 4 NULL 30 "Using temporary"
1 SIMPLE pt4p ref idx_product_tags_for_products idx_product_tags_for_products 3 s.id 1 "Using index"
1 SIMPLE pt eq_ref PRIMARY PRIMARY 4 pt4p.product_tag_id 1
I think it's doing things in the wrong order, i.e. ORDERing the results after the join, using a large temporary table, and then LIMITing. The query plan in my head would go something like this:
ORDER the products table using the 'created' key
Step through each row, LEFT JOINing it against the other tables until the LIMIT of 30 has been reached.
This sounds simple, but it doesn't seem to work like that - am I missing something?
CREATE TABLE `products` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`rating` float NOT NULL,
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`last_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`active` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `created` (`created`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `product_tags_for_products` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`product_id` mediumint(8) unsigned NOT NULL,
`product_tag_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_product_tags_for_products` (`product_id`,`product_tag_id`),
KEY `product_tag_id` (`product_tag_id`),
CONSTRAINT `product_tags_for_products_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`),
CONSTRAINT `product_tags_for_products_ibfk_2` FOREIGN KEY (`product_tag_id`) REFERENCES `product_tags` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `product_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Updated with profiling information at Salman A's request:
Status,
Duration,CPU_user,CPU_system,Context_voluntary,Context_involuntary,Block_ops_in,Block_ops_out,Messages_sent,Messages_received,Page_faults_major,Page_faults_minor,Swaps,Source_function,Source_file,Source_line
starting,
0.000124,0.000106,0.000015,0,0,0,0,0,0,0,0,0,NULL,NULL,NULL
"Opening tables",
0.000022,0.000020,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_base.cc,4519
"System lock",
0.000007,0.000004,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",lock.cc,258
"Table lock",
0.000011,0.000009,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",lock.cc,269
init,
0.000055,0.000054,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,2524
optimizing,
0.000008,0.000006,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,833
statistics,
0.000116,0.000051,0.000066,0,0,0,0,0,0,0,1,0,"unknown function",sql_select.cc,1024
preparing,
0.000027,0.000023,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1046
"Creating tmp table",
0.000054,0.000053,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1546
"Sorting for group",
0.000018,0.000015,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1596
executing,
0.000004,0.000002,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1780
"Copying to tmp table",
0.061716,0.049455,0.013560,0,18,0,0,0,0,0,3680,0,"unknown function",sql_select.cc,1927
"converting HEAP to MyISAM",
0.046731,0.006371,0.017543,3,5,0,3,0,0,0,32,0,"unknown function",sql_select.cc,10980
"Copying to tmp table on disk",
10.700166,3.038211,1.191086,538,1230,1,31,0,0,0,65,0,"unknown function",sql_select.cc,11045
"Sorting result",
0.777887,0.155327,0.618896,2,137,0,1,0,0,0,634,0,"unknown function",sql_select.cc,2201
"Sending data",
0.000336,0.000159,0.000178,0,0,0,0,0,0,0,1,0,"unknown function",sql_select.cc,2334
end,
0.000005,0.000003,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,2570
"removing tmp table",
0.106382,0.000058,0.080105,4,9,0,11,0,0,0,0,0,"unknown function",sql_select.cc,10912
end,
0.000015,0.000007,0.000007,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,10937
"query end",
0.000004,0.000002,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,5083
"freeing items",
0.000012,0.000012,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,6107
"removing tmp table",
0.000010,0.000009,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,10912
"freeing items",
0.000084,0.000022,0.000057,0,1,0,0,1,0,0,0,0,"unknown function",sql_select.cc,10937
"logging slow query",
0.000004,0.000001,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1723
"logging slow query",
0.000049,0.000031,0.000018,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1733
"cleaning up",
0.000007,0.000005,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1691
The tables are:
Products = 84.1MiB (there are extra fields in the products table which I omitted for clarity)
Tags = 32KiB
Linking table = 46.6MiB
I would try limiting the number of products to 30 first and then joining with only 30 products:
SELECT p.*, GROUP_CONCAT(pt.name) as tags
FROM (SELECT p30.* FROM products p30 ORDER BY p30.created LIMIT 30) p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
I know you said no subqueries but you did not explain why, and I don't see any other way to solve your issue.
Note that you can eliminate the subselect by putting that in a view:
CREATE VIEW v_last30products AS
SELECT p30.* FROM products p30 ORDER BY p30.created LIMIT 30;
Then the query is simplified to:
SELECT p.*, GROUP_CONCAT(pt.name) as tags
FROM v_last30products p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
Other issue, your n-to-n table product_tags_for_products
Does not make sense, I'd restructure it like so:
CREATE TABLE `product_tags_for_products` (
`product_id` mediumint(8) unsigned NOT NULL,
`product_tag_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`product_id`,`product_tag_id`),
CONSTRAINT `product_tags_for_products_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`),
CONSTRAINT `product_tags_for_products_ibfk_2` FOREIGN KEY (`product_tag_id`) REFERENCES `product_tags` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
This should make the query faster by:
- shortening the key used (On InnoDB the PK is always included in secondary keys);
- Allowing you to use the PK which should be faster than using a secondary key;
More speed issues
If you replace the select * with only the fields that you need select p.title, p.rating, ... FROM that will also speed up things a little.
Ah - I see that none of the keys you GROUP BY on are BTREE, by default PRIMARY keys are hashes.
It helps group by when there is an ordering index... otherwise it has to scan...
What I mean is, I think it would help significantly if you added a BTREE based index for p.id and p.created. In that case I think the engine will avoid having to scan/sort all those keys to execute group by and order by.
Regarding filtering on tags (which you mentioned in the comments on Johan's answer), if the obvious
SELECT p.*, GROUP_CONCAT(pt.name) AS tags
FROM products p
JOIN product_tags_for_products pt4p2 ON (pt4p2.product_id = p.id)
JOIN product_tags pt2 ON (pt2.id = pt4p2.product_tag_id)
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
WHERE pt2.name IN ('some', 'tags', 'here')
GROUP BY p.id
ORDER BY p.created LIMIT 30
doesn't run fast enough, you could always try this:
CREATE TEMPORARY TABLE products30
SELECT p.*
FROM products p
JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
WHERE pt.name IN ('some', 'tags', 'here')
GROUP BY p.id
ORDER BY p.created LIMIT 30
SELECT p.*, GROUP_CONCAT(pt.name) AS tags
FROM products30 p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
(I used a temp table because you said "no subqueries"; I don't know if they're any easier to use in an Active Record framework, but at least it's another way to do it.)
Ps. One really off-the-wall idea about your original problem: would it make any difference if you changed the GROUP BY p.id clause to GROUP BY p.created, p.id? Probably not, but I'd at least try it.

MySQL ORDER BY DESC is fast but ASC is very slow

For some reason when I sort this query by DESC it's super fast, but if sorted by ASC it's extremely slow.
This takes about 150 milliseconds:
SELECT posts.id
FROM posts USE INDEX (published)
WHERE posts.feed_id IN ( 4953,622,1,1852,4952,76,623,624,10 )
ORDER BY posts.published DESC
LIMIT 0, 50;
This takes about 32 seconds:
SELECT posts.id
FROM posts USE INDEX (published)
WHERE posts.feed_id IN ( 4953,622,1,1852,4952,76,623,624,10 )
ORDER BY posts.published ASC
LIMIT 0, 50;
The EXPLAIN is the same for both queries.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts index NULL published 5 NULL 50 Using where
I've tracked it down to "USE INDEX (published)". If I take that out it's the same performance both ways. But the EXPLAIN shows the query is less efficient overall.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts range feed_id feed_id 4 \N 759 Using where; Using filesort
And here's the table.
CREATE TABLE `posts` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL,
`post_url` varchar(255) NOT NULL,
`title` varchar(255) NOT NULL,
`content` blob,
`author` varchar(255) DEFAULT NULL,
`published` int(12) DEFAULT NULL,
`updated` datetime NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `post_url` (`post_url`,`feed_id`),
KEY `feed_id` (`feed_id`),
KEY `published` (`published`)
) ENGINE=InnoDB AUTO_INCREMENT=196530 DEFAULT CHARSET=latin1;
Is there a fix for this?
Your index is sorted desc so when you ask for ascending it needs to do a lot more work to bring it back in that order
I wouldn't suggest you create another index on the table; every time a row is inserted or deleted, each index on the table needs to be updated, slowing down INSERT queries.
The index is definitely what's slowing it down. Maybe you could try IGNORE-ing it:
SELECT posts.id
FROM posts IGNORE INDEX (published)
WHERE posts.feed_id IN ( 4953,622,1,1852,4952,76,623,624,10 )
ORDER BY posts.published ASC
LIMIT 0, 50;
Or, since the field is already KEYed, you might try the following:
SELECT posts.id
FROM posts USE KEY (published)
WHERE posts.feed_id IN ( 4953,622,1,1852,4952,76,623,624,10 )
ORDER BY posts.published ASC
LIMIT 0, 50;
You could get your data set first, and then order it.
Something like
SELECT posts.id FROM (
SELECT posts.id
FROM posts USE INDEX (published)
WHERE posts.feed_id IN ( 4953,622,1,1852,4952,76,623,624,10 )
LIMIT 0, 50
)
order by postS.id ASC;
It should first use the index to find all records that satisfy your "where" statement, and the will order them. But the order would be performed in a smaller set. Give it a try and then tell us.
Best Regards.
You want to add an index across (feed_id, published):
ALTER TABLE posts ADD INDEX (feed_id, published)
That'll make this query run best, and you won't need to force a particular index with USE INDEX.
How about flipping the WHERE condition?
SELECT posts.id
FROM posts USE INDEX (published)
WHERE posts.feed_id IN ( 10,624,623,76,4952,1852,622,4953 )
ORDER BY posts.published DESC;