Need some help deciphering MYSQL EXPLAIN output for complex join - mysql
My site's homepage has a complex query that looks like this:
SELECT karmalog.*, image.title as img_title, image.date_uploaded, imagefile.file_name as img_filename, imagefile.width as img_width, imagefile.height as img_height, imagefile.transferred as img_transferred, u1.uname as usr_name1, u2.uname as usr_name2, u1.avat_url as usr_avaturl1, u2.avat_url as usr_avaturl2, class.title as class_title,forum.id as f_id, forum.name as f_name, forum.icon, forumtopic.id as ft_id, forumtopic.subject
FROM karmalog
LEFT JOIN image on karmalog.event_type = 'image' and karmalog.object_id = image.id
LEFT JOIN imagefile on karmalog.object_id = imagefile.image_id and imagefile.type = 'smallthumb'
LEFT JOIN class on karmalog.event_type = 'class' and karmalog.object_id = class.num
LEFT JOIN user as u1 on karmalog.user_id = u1.id
LEFT JOIN user as u2 on karmalog.user_sec_id = u2.id
LEFT JOIN forumtopic on karmalog.object_id = forumtopic.id and karmalog.event IN ('FORUM_REPLY','FORUM_CREATE')
LEFT JOIN forum on forumtopic.forum_id = forum.id
WHERE karmalog.event IN ('EDIT_PROFILE','FAV_IMG_ADD','FOLLOW','COM_POST','IMG_UPLOAD','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','CLASS_DOWN','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE')
AND karmalog.delete=0
ORDER BY karmalog.date_created DESC, karmalog.id DESC
LIMIT 0,13
I won't bore you with the exact details, but a short explanation: Basically this is a list of events that happened in the system, kind of like a stream. A event can be of several types and based on its type it needs to join in specific data from various tables.
Currently, this query takes 2 seconds to run but it will get slower over time as the amount of entries grows. Therefore I'm looking to optimize it. Here's the output of MYSQL explain:
My understanding of EXPLAIN is too limited to understand this. I would prefer to keep this query as is (instead of denormalizing it), yet to improve its performance using appropriate indices or other quick wins. Based on this explain output, is there anything you see that I can follow-up with?
Edit: as requested hereby the definition of the karmalog table:
CREATE TABLE `karmalog` (
`id` int(11) NOT NULL auto_increment,
`guid` char(36) default NULL,
`user_id` int(11) default NULL,
`user_sec_id` int(11) default NULL,
`event` enum('EDIT_PROFILE','EDIT_AVATAR','EDIT_EMAIL','EDIT_PASSWORD','FAV_IMG_ADD','FAV_IMG_ADDED','FAV_IMG_REMOVE','FAV_IMG_REMOVED','FOLLOW','FOLLOWED','UNFOLLOW','UNFOLLOWED','COM_POSTED','COM_POST','COM_VOTE','COM_VOTED','IMG_VOTED','IMG_UPLOAD','LIST_CREATE','LIST_DELETE','LIST_ADMINDELETE','LIST_VOTE','LIST_VOTED','IMG_UPD','IMG_RESTORE','IMG_UPD_LIC','IMG_UPD_MOD','IMG_UPD_MODERATED','IMG_VOTE','IMG_VOTED','TAG_FAV_ADD','CLASS_DOWN','CLASS_UP','IMG_DELETE','IMG_ADMINDELETE','IMG_ADMINDELETEFAV','SET_PASSWORD','IMG_RESTORED','IMG_VIEW','FORUM_CREATE','FORUM_DELETE','FORUM_ADMINDELETE','FORUM_REPLY','FORUM_DELETEREPLY','FORUM_ADMINDELETEREPLY','FORUM_SUBSCRIBE','FORUM_UNSUBSCRIBE','TAG_INFO_EDITED','JOIN') NOT NULL,
`event_type` enum('follow','tag','image','class','list','forum','user') NOT NULL,
`active` bit(1) NOT NULL,
`delete` bit(1) NOT NULL default '\0',
`object_id` int(11) default NULL,
`object_cache` varchar(1024) default NULL,
`karma_delta` int(11) NOT NULL,
`gold_delta` int(11) NOT NULL,
`newkarma` int(11) NOT NULL,
`newgold` int(11) NOT NULL,
`mail_processed` bit(1) NOT NULL default '\0',
`date_created` timestamp NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `user_sec_id` (`user_sec_id`),
KEY `image_id` (`object_id`),
CONSTRAINT `user_id` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE SET NULL,
CONSTRAINT `user_sec_id` FOREIGN KEY (`user_sec_id`) REFERENCES `user` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
First, you are probably missing a composite index on (event_type, object_id). On second reading, disregard this. You may need such an index for other queries but not for this one (because of the ORDER BY ... LIMIT).
Second, you don't have an index on date_created and you ORDER BY this column. Add an index on this. Taking the WHERE conditions in mind too, the best index may be the (delete, date_created) or the (event, date_created) or (probably best) the: (event, delete, date_created).
Third, try to rewrite it like this:
LIMIT first, then JOIN (corrected):
SELECT karmalog.*, image.title as img_title, image.date_uploaded, imagefile.file_name as img_filename, imagefile.width as img_width, imagefile.height as img_height, imagefile.transferred as img_transferred, u1.uname as usr_name1, u2.uname as usr_name2, u1.avat_url as usr_avaturl1, u2.avat_url as usr_avaturl2, class.title as class_title,forum.id as f_id, forum.name as f_name, forum.icon, forumtopic.id as ft_id, forumtopic.subject
FROM
( SELECT *
FROM karmalog
WHERE karmalog.event IN ('EDIT_PROFILE','FAV_IMG_ADD','FOLLOW','COM_POST','IMG_UPLOAD','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','CLASS_DOWN','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE')
AND karmalog.delete=0
ORDER BY karmalog.date_created DESC, karmalog.id DESC
LIMIT 0,13
) AS karmalog
LEFT JOIN image on karmalog.event_type = 'image' and karmalog.object_id = image.id
LEFT JOIN imagefile on karmalog.object_id = imagefile.image_id and imagefile.type = 'smallthumb'
LEFT JOIN class on karmalog.event_type = 'class' and karmalog.object_id = class.num
LEFT JOIN user as u1 on karmalog.user_id = u1.id
LEFT JOIN user as u2 on karmalog.user_sec_id = u2.id
LEFT JOIN forumtopic on karmalog.object_id = forumtopic.id and karmalog.event IN ('FORUM_REPLY','FORUM_CREATE')
LEFT JOIN forum on forumtopic.forum_id = forum.id
ORDER BY karmalog.date_created DESC, karmalog.id DESC
The important parts of the explain are: possible keys, keys and rows.
If there are no possible keys, you need to create indexes.
If no key is used, it may be either due to:
low cardinality of the key;
usage of functions;
Focus your efforts on the table with the highest number of rows. i.e. karmalog.
Remember that MySQL can only use one index per select per table.
The joins are all left joins, so they do not limit the rowcount in karmalog indexes will not help you here.
Looking at the where part, deleted has low cardinality (only 2 values, 90% of which will be =0). So only fields event+date_created qualify for an index, put an index on:
ALTER TABLE karmalog ADD INDEX date_event (event, date_created);
Try and put an index on the table karmalog definitely event (maybe delete and object_id) as this will make it faster, and give it a key for the first join.
Second look at this table and figure out if you could do this with some sort of join to make it lighter in the future. But that would probably mean a change to your db
karmalog.event IN ('EDIT_PROFILE','FAV_IMG_ADD','FOLLOW','COM_POST','IMG_UPLOAD','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','CLASS_DOWN','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE')
Related
MySQL - how to optimize query with order by
I am trying to generate a list of the 5 most recent history items for for a collection of user tasks. If I remove the order by the execution drops from ~2 seconds to < 20msec. Indexes are on h.task_id h.mod_date i.task_id i.user_id This is the query SELECT h.* , i.task_id , i.user_id , i.name , i.completed FROM h , i WHERE i.task_id = h.task_id AND i.user_id = 42 ORDER BY h.mod_date DESC LIMIT 5 Here is the explain: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE i ref PRIMARY,UserID UserID 4 const 3091 Using temporary; Using filesort 1 SIMPLE h ref TaskID TaskID 4 myDB.i.task_id 7 Here are the show create tables: CREATE TABLE `h` ( `history_id` int(6) NOT NULL AUTO_INCREMENT, `history_code` tinyint(4) NOT NULL DEFAULT '0', `task_id` int(6) NOT NULL, `mod_date` datetime NOT NULL, `description` text NOT NULL, PRIMARY KEY (`history_id`), KEY `TaskID` (`task_id`), KEY `historyCode` (`history_code`), KEY `modDate` (`mod_date`) ) ENGINE=InnoDB AUTO_INCREMENT=185647 DEFAULT CHARSET=latin1 and CREATE TABLE `i` ( `task_id` int(6) NOT NULL AUTO_INCREMENT, `user_id` int(6) NOT NULL, `name` varchar(60) NOT NULL, `due_date` date DEFAULT NULL, `create_date` date NOT NULL, `completed` tinyint(1) NOT NULL DEFAULT '0', `task_description` blob, PRIMARY KEY (`task_id`), KEY `name_2` (`name`), KEY `UserID` (`user_id`) ) ENGINE=InnoDB AUTO_INCREMENT=12085 DEFAULT CHARSET=latin1
INDEX(task_id, mod_date, history_id) -- in this order Will be "covering" and the columns will be in the optimal order Also, DROP KEY `TaskID` (`task_id`) So that the Optimizer won't be tempted to use it.
Try changing the index on h.task_id so it's this compound index. CREATE OR REPLACE INDEX TaskID ON h(task_id, mod_date DESC); This may (or may not) allow MySql to shortcut some or all the extra work in your ORDER BY ... LIMIT ... request. It's a notorious performance anti pattern, by the way, but sometimes necessary. Edit the index didn't help. So let's try a so-called deferred join so we don't have to ORDER and then LIMIT all the data from your h table. Start with this subquery. It retrieves only the primary key values for the rows involved in your results, and will generate just five rows. SELECT h.history_id, i.task_id FROM h JOIN i ON h.task_id = i.task_id WHERE i.user_id = 42 ORDER BY h.mod_date LIMIT 5 Why this subquery? It handles the work-intensive ORDER BY ... LIMIT operation while manipulating only the primary keys and the date. It still must sort tons of rows only to discard all but five, but the rows it has to handle are much shorter. Because this subquery does the heavy work, you focus on optimizing it, rather than the whole query. Keep the index I suggested above, because it covers the subquery for h. Then, join it to the rest of your query like this. That way you'll only have to retrieve the expensive h.description column for the five rows you care about. SELECT h.* , i.task_id, i.user_id , i.name, i.completed FROM h JOIN i ON i.task_id = h.task_id JOIN ( SELECT h.history_id, i.task_id FROM h JOIN i ON h.task_id = i.task_id WHERE i.user_id = 42 ORDER BY h.mod_date LIMIT 5 ) selected ON h.history_id = selected.history_id AND i.task_id = selected.task_id ORDER BY h.mod_date DESC LIMIT 5
select taking 9.+ seconds. how to re-write it better?
I have this select: select t.id, c.user, t.title, pp.foto, t.data from topics t inner join cadastro c on t.user = c.id left join profile_picture pp on t.user = pp.user left join ( select c.topic, MAX(c.data) cdata from comments c group by c.topic )c on t.id = c.topic where t.community = ? order by ifnull(cdata, t.data) desc limit 15 I want to select topics and order them by their date or the date of the topic comments, if it has comments. Unfortunately, this is taking more than 9 seconds. I don't think the problem here is indexing, but the way I am writing the select itself. `topics` ( `id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT, `user` INT(11) UNSIGNED NOT NULL, `title` varchar(100) NOT NULL, `description` varchar(1000), `community` INT(11) UNSIGNED NOT NULL, `data` datetime NOT NULL, `ip` varchar(20), PRIMARY KEY (`id`), FOREIGN KEY (`user`) REFERENCES cadastro (`id`), FOREIGN KEY (`community`) REFERENCES discussion (`id`) ) `comments` ( `id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT, `user` INT(11) UNSIGNED NOT NULL, `comment` varchar(1000) NOT NULL, `topic` INT(11) UNSIGNED NOT NULL, `data` datetime NOT NULL, `ip` varchar(20), `delete` tinyint(1) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), FOREIGN KEY (`user`) REFERENCES cadastro (`id`), FOREIGN KEY (`topic`) REFERENCES topics (`id`) )
Your EXPLAIN gives you a strong hint. The first row in that results says, using temporary, using filesort implying that it's not using a an index. It might be possible to improve this query by adding indexes and removing some conditionals, but I think in this case a better solution exists. Why not add a new column to topics that indicates the last time a comment was added? (like a last_modified). Every time a comment gets added, just update that column for that topic as well. It's effectively denormalizing this. I think this a valid usecase and it's always going to be faster than fixing this messy query.
You are performing a full table scan on the table comments on every query. How many rows does it have? At least create the following index: comments (topic, data); to avoid reading the whole table every time.
I know you've said you don't think the problem is indexing, but 9 out of 10 times I've had this problem that's exactly what it's been down to. Ensure you have an index created on each table that you're using in the query and include the columns specified in the join. Also, as NiVeR said, don't use the same alias multiple times. Here's a refactoring of that query, unsure if I've mixed up or missed a column name/alias or two though. select t.id, c.user, t.title, pp.foto, t.data from topics t inner join cadastro c on t.user = c.id left join profile_picture pp on t.user = pp.user left join ( select com.topic, MAX(com.data) comdata from comments com group by com.topic )com1 on t.id = com1.topic where t.community = ? order by ifnull(com1.comdata, t.data) desc limit 15
query taking too long, while split it to two queries taking 0.2 sec
i have the current query: select m.id, ms.severity, ms.risk_score, count(distinct si.id), boarding_date_tbl.boarding_date from merchant m join merchant_has_scan ms on m.last_scan_completed_id = ms.id join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true join (select m.id merchant_id, min(s_for_boarding.scan_date) boarding_date from merchant m left join merchant_has_scan ms on m.id = ms.merchant_id left join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1 group by m.id) boarding_date_tbl on boarding_date_tbl.merchant_id = m.id group by m.id limit 100; when i run it on big scheme (about 2mil "merchant") it takes more then 20 sec. but if i'll split it to: select m.legal_name, m.unique_id, m.merchant_status, s_for_boarding.scan_date from merchant m join merchant_has_scan ms on m.id = ms.merchant_id join scan s_for_boarding on s_for_boarding.id = ms.scan_id and s_for_boarding.scan_type = 1 group by m.id limit 100; and select m.id, ms.severity, ms.risk_score, count(distinct si.id) from merchant m join merchant_has_scan ms on m.last_scan_completed_id = ms.id join scan_item si on si.merchant_has_scan_id = ms.id and si.is_registered = true group by m.id limit 100; both will take about 0.1 sec the reason for that is clear, the low limit means it doesn't need to do much to get the first 100. it is also clear that the inner select cause the first query to run as much as it does. my question is there a way to do the inner select only on the relevant merchants and not on the entire table? Update making a left join instead of a join before the inner query help reduce it to 6 sec, but it still a lot more then what i can get if i do 2 queries UPDATE 2 create table for merchant: CREATE TABLE `merchant` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `last_scan_completed_id` bigint(20) DEFAULT NULL, `last_updated` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), CONSTRAINT `FK_9lhkm7tb4bt87qy4j3fjayec5` FOREIGN KEY (`last_scan_completed_id`) REFERENCES `merchant_has_scan` (`id`) ) merchant_has_scan: CREATE TABLE `merchant_has_scan` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `merchant_id` bigint(20) NOT NULL, `risk_score` int(11) DEFAULT NULL, `scan_id` bigint(20) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unique_merchant_id` (`scan_id`,`merchant_id`), CONSTRAINT `FK_3d8f81ts5wj2u99ddhinfc1jp` FOREIGN KEY (`scan_id`) REFERENCES `scan` (`id`), CONSTRAINT `FK_e7fhioqt9b9rp9uhvcjnk31qe` FOREIGN KEY (`merchant_id`) REFERENCES `merchant` (`id`) ) scan_item: CREATE TABLE `scan_item` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `is_registered` bit(1) NOT NULL, `merchant_has_scan_id` bigint(20) NOT NULL, PRIMARY KEY (`id`), CONSTRAINT `FK_avcc5q3hkehgreivwhoc5h7rb` FOREIGN KEY (`merchant_has_scan_id`) REFERENCES `merchant_has_scan` (`id`) ) scan: CREATE TABLE `scan` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `scan_date` datetime DEFAULT NULL, `scan_type` int(11) NOT NULL, PRIMARY KEY (`id`) ) and the explain:
You don't have the latest version of MySQL, which would be able to create an index for the derived table. (What version are you running?) The "derived table" (the subquery) will be the first table in the EXPLAIN because, well, it has to be. merchant_has_scan is a many:many table, but without the optimization tips here -- fixing this may be the biggest factor in speeding it up. Caveat: The tips suggest getting rid of id, but you seem to have a use for id, so keep it. The COUNT(DISTINCT si.id) and JOIN si... can be replaced by ( SELECT COUNT(*) FROM scan_item WHERE ...), thereby eliminating one of the JOINs and possibly diminishing the Explode-Implode . LEFT JOIN -- are you sometimes expecting to get NULL for boarding_date? If not, please use JOIN, not LEFT JOIN. (It is better to state your intention than to leave the query open to multiple interpretations.) If you can remove the LEFTs, then since m.id and merchant_id are specified to be equal, why list them both in the SELECT? (This is a confusion factor, not a speed question). You say you split it into two -- but you did not. You added LIMIT 100 to the inner query when you pulled it out. If you need that, add it to the derived table, too. Then you may be able to remove GROUP BY m.id LIMIT 100 from the outer query.
I'm at a dead end. SQL Query to get the last replied username and time stamp
This is the query I am trying to get to work. SELECT t.id AS topic_id, t.title AS topic_title, t.content AS topic_content, t.created_at AS topic_created_at, t.updated_at AS topic_updated_at, t.user_id AS topic_user_id, c.id AS comment_id, c.content AS comment_content, c.created_at AS comment_created_at, max_c.username AS comment_username, u.username AS topic_username FROM topics t JOIN (SELECT c2.topic_id, c2.created_at, u2.username FROM comments c2 JOIN users u2 ON c2.user_id = u2.id JOIN topics t2 ON c2.topic_id = t2.id ORDER BY c2.created_at desc) max_c ON t.id = max_c.topic_id JOIN comments c ON max_c.created_at = c.created_at JOIN users u ON u.id = t.user_id ORDER BY c.created_at DESC Pretty sure this part of the query is not correct: SELECT c2.topic_id, c2.created_at, u2.username FROM comments c2 JOIN users u2 ON c2.user_id = u2.id JOIN topics t2 ON c2.topic_id = t2.id ORDER BY c2.created_at desc That query currently displays the following. But I want to group by created_at or what ever is suitable so we only get the latest replied on topic. If you can help this would be amazing I've spent about 5 hours trying to write this so far... I've attached my table migrations below. # Dump of table comments # ------------------------------------------------------------ CREATE TABLE `comments` ( `id` int(11) NOT NULL AUTO_INCREMENT, `content` varchar(255) DEFAULT NULL, `user_id` int(11) NOT NULL, `topic_id` int(11) DEFAULT NULL, `updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP, `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `user_id` (`user_id`), KEY `comments_ibfk_2` (`topic_id`), CONSTRAINT `comments_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`), CONSTRAINT `comments_ibfk_2` FOREIGN KEY (`topic_id`) REFERENCES `topics` (`id`) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8; # Dump of table topics # ------------------------------------------------------------ CREATE TABLE `topics` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) DEFAULT NULL, `content` text, `user_id` int(11) NOT NULL, `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP, `updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `user_id` (`user_id`), CONSTRAINT `topics_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; # Dump of table users # ------------------------------------------------------------ CREATE TABLE `users` ( `id` int(11) NOT NULL AUTO_INCREMENT, `username` varchar(255) DEFAULT 'NOT NULL', `email` varchar(255) DEFAULT 'NOT NULL', `password` char(60) DEFAULT 'NOT NULL', `admin` int(11) NOT NULL DEFAULT '0', `created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
If I understand your question the problem you are having is the subquery should return only the last username and timestamp, but returns all instead. That being the case you could just use LIMIT 1 after the order by to have desired result. SELECT c2.topic_id, c2.created_at,u2.username FROM comments c2 JOIN users u2 ON c2.user_id = u2.id JOIN topics t2 ON c2.topic_id = t2.id ORDER BY c2.created_at desc LIMIT 1
Please try the following... SELECT topics.id AS topic_id, topics.title AS topic_title, topics.content AS topic_content, topics.created_at AS topic_created_at, topics.updated_at AS topic_updated_at, topicsUsers.user_id AS topic_user_id, topicsUsers.username AS topic_username, comments.id AS comment_id, comments.content AS comment_content, maxCreatedAt AS comment_created_at, commentsUsers.user_id AS comment_user_id commentsUsers.username AS comment_username, FROM topics JOIN ( SELECT topic_id AS topic_id, MAX( comments.created_at ) AS maxCreatedAt, FROM comments GROUP BY topic_id ) maxCreatedAtFinder ON topics.id = maxCreatedAtFinder.topic_id JOIN comments ON maxCreatedAtFinder.maxCreatedAt = comments.created_at AND maxCreatedAtFinder.topic_id = topics.id JOIN users AS topicsUsers ON users.id = topics.user_id JOIN users AS commentsUsers ON users.id = comments.user_id ORDER BY maxCreatedAt DESC As best I can tell from your Question, you are looking for a list that shows the details of each topic (including the user details of the user who initiated the Topic) along with the most recent comment for that Topic and the details of the user who posted that comment. My statement follows a similar structure to the statement you supplied. I have removed from the subquery the JOIN to topics since the topic_id is the only detail from topics that we want for the subquery, and topic_id can also be obtained from comments. Obtaining the details of the user who posted the most recent comment for the Topic would complicate the grouping and result in more joins being performed than if we joined Comments to Users outside of the subquery. All the subquery really needs to do is obtain the most recent value of created_at for each Topic. As such I have removed the JOIN to Users and the associated field selections. In the main statement I have maintained the INNER JOIN of the subquery to topics, which effectively appends the maximum value of created_at from comments to its corresponding record from topics. I then take the resulting dataset and join it to comments in such a way that a topic and its associated most recent value of created_at from comments has the contents of each comment for that topic with that created_at value appended to it. Please note that whilst unlikely it is possible for two comments belonging to a topic to be created at the same time and thus to have the same value of created_at. Both such records will be joined to the dataset being formed. In the absence of instructions to the contrary I have assumed that is desired behaviour and have allowed it. I then take the resulting dataset and JOIN it to two instances of users, one based on the topic_id and the other based on the user_id for the comment. This has the effect of appending to each record in our dataset the user details for the user who created the Topic and the user details of the user(s) who posted the most recent comment(s). This final dataset is then sorted into chronological order, starting with the most recent record(s). The desired fields are then selected and returned by the statement. If you have any questions or comments, then please feel free to post a Comment accordingly.
Optimising MySQL query on JOINed tables with GROUP BY and ORDER BY without using nested queries
This feels like a bit of a beginner SQL question to me, but here goes. This is what I'm trying to do: join three tables together, products, tags, and a linking table. aggregate the tags into a single comma delimited field (hence the GROUP_CONCAT and the GROUP BY) limit the results (to 30) have the results in order of the 'created' date avoid using subqueries where possible, as they're particularly unpleasant to code using an Active Record framework I've described the tables involved at the bottom of this post, but here's the query that I'm performing SELECT p.*, GROUP_CONCAT(pt.name) FROM products p LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id) LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id) GROUP BY p.id ORDER BY p.created LIMIT 30; There are about 280,000 products, 130 tags, 524,000 linking records and I've ANALYZEd the tables. The problem is that it's taking over 80s to run (on decent hardware), which feels wrong to me. Here's the EXPLAIN results: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE p index NULL created 4 NULL 30 "Using temporary" 1 SIMPLE pt4p ref idx_product_tags_for_products idx_product_tags_for_products 3 s.id 1 "Using index" 1 SIMPLE pt eq_ref PRIMARY PRIMARY 4 pt4p.product_tag_id 1 I think it's doing things in the wrong order, i.e. ORDERing the results after the join, using a large temporary table, and then LIMITing. The query plan in my head would go something like this: ORDER the products table using the 'created' key Step through each row, LEFT JOINing it against the other tables until the LIMIT of 30 has been reached. This sounds simple, but it doesn't seem to work like that - am I missing something? CREATE TABLE `products` ( `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT, `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `rating` float NOT NULL, `created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `last_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `active` tinyint(1) NOT NULL, PRIMARY KEY (`id`), KEY `created` (`created`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci CREATE TABLE `product_tags_for_products` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `product_id` mediumint(8) unsigned NOT NULL, `product_tag_id` int(10) unsigned NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `idx_product_tags_for_products` (`product_id`,`product_tag_id`), KEY `product_tag_id` (`product_tag_id`), CONSTRAINT `product_tags_for_products_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`), CONSTRAINT `product_tags_for_products_ibfk_2` FOREIGN KEY (`product_tag_id`) REFERENCES `product_tags` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci CREATE TABLE `product_tags` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `name` (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci Updated with profiling information at Salman A's request: Status, Duration,CPU_user,CPU_system,Context_voluntary,Context_involuntary,Block_ops_in,Block_ops_out,Messages_sent,Messages_received,Page_faults_major,Page_faults_minor,Swaps,Source_function,Source_file,Source_line starting, 0.000124,0.000106,0.000015,0,0,0,0,0,0,0,0,0,NULL,NULL,NULL "Opening tables", 0.000022,0.000020,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_base.cc,4519 "System lock", 0.000007,0.000004,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",lock.cc,258 "Table lock", 0.000011,0.000009,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",lock.cc,269 init, 0.000055,0.000054,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,2524 optimizing, 0.000008,0.000006,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,833 statistics, 0.000116,0.000051,0.000066,0,0,0,0,0,0,0,1,0,"unknown function",sql_select.cc,1024 preparing, 0.000027,0.000023,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1046 "Creating tmp table", 0.000054,0.000053,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1546 "Sorting for group", 0.000018,0.000015,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1596 executing, 0.000004,0.000002,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1780 "Copying to tmp table", 0.061716,0.049455,0.013560,0,18,0,0,0,0,0,3680,0,"unknown function",sql_select.cc,1927 "converting HEAP to MyISAM", 0.046731,0.006371,0.017543,3,5,0,3,0,0,0,32,0,"unknown function",sql_select.cc,10980 "Copying to tmp table on disk", 10.700166,3.038211,1.191086,538,1230,1,31,0,0,0,65,0,"unknown function",sql_select.cc,11045 "Sorting result", 0.777887,0.155327,0.618896,2,137,0,1,0,0,0,634,0,"unknown function",sql_select.cc,2201 "Sending data", 0.000336,0.000159,0.000178,0,0,0,0,0,0,0,1,0,"unknown function",sql_select.cc,2334 end, 0.000005,0.000003,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,2570 "removing tmp table", 0.106382,0.000058,0.080105,4,9,0,11,0,0,0,0,0,"unknown function",sql_select.cc,10912 end, 0.000015,0.000007,0.000007,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,10937 "query end", 0.000004,0.000002,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,5083 "freeing items", 0.000012,0.000012,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,6107 "removing tmp table", 0.000010,0.000009,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,10912 "freeing items", 0.000084,0.000022,0.000057,0,1,0,0,1,0,0,0,0,"unknown function",sql_select.cc,10937 "logging slow query", 0.000004,0.000001,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1723 "logging slow query", 0.000049,0.000031,0.000018,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1733 "cleaning up", 0.000007,0.000005,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1691 The tables are: Products = 84.1MiB (there are extra fields in the products table which I omitted for clarity) Tags = 32KiB Linking table = 46.6MiB
I would try limiting the number of products to 30 first and then joining with only 30 products: SELECT p.*, GROUP_CONCAT(pt.name) as tags FROM (SELECT p30.* FROM products p30 ORDER BY p30.created LIMIT 30) p LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id) LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id) GROUP BY p.id ORDER BY p.created I know you said no subqueries but you did not explain why, and I don't see any other way to solve your issue. Note that you can eliminate the subselect by putting that in a view: CREATE VIEW v_last30products AS SELECT p30.* FROM products p30 ORDER BY p30.created LIMIT 30; Then the query is simplified to: SELECT p.*, GROUP_CONCAT(pt.name) as tags FROM v_last30products p LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id) LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id) GROUP BY p.id ORDER BY p.created Other issue, your n-to-n table product_tags_for_products Does not make sense, I'd restructure it like so: CREATE TABLE `product_tags_for_products` ( `product_id` mediumint(8) unsigned NOT NULL, `product_tag_id` int(10) unsigned NOT NULL, PRIMARY KEY (`product_id`,`product_tag_id`), CONSTRAINT `product_tags_for_products_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`), CONSTRAINT `product_tags_for_products_ibfk_2` FOREIGN KEY (`product_tag_id`) REFERENCES `product_tags` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci This should make the query faster by: - shortening the key used (On InnoDB the PK is always included in secondary keys); - Allowing you to use the PK which should be faster than using a secondary key; More speed issues If you replace the select * with only the fields that you need select p.title, p.rating, ... FROM that will also speed up things a little.
Ah - I see that none of the keys you GROUP BY on are BTREE, by default PRIMARY keys are hashes. It helps group by when there is an ordering index... otherwise it has to scan... What I mean is, I think it would help significantly if you added a BTREE based index for p.id and p.created. In that case I think the engine will avoid having to scan/sort all those keys to execute group by and order by.
Regarding filtering on tags (which you mentioned in the comments on Johan's answer), if the obvious SELECT p.*, GROUP_CONCAT(pt.name) AS tags FROM products p JOIN product_tags_for_products pt4p2 ON (pt4p2.product_id = p.id) JOIN product_tags pt2 ON (pt2.id = pt4p2.product_tag_id) LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id) LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id) WHERE pt2.name IN ('some', 'tags', 'here') GROUP BY p.id ORDER BY p.created LIMIT 30 doesn't run fast enough, you could always try this: CREATE TEMPORARY TABLE products30 SELECT p.* FROM products p JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id) JOIN product_tags pt ON (pt.id = pt4p.product_tag_id) WHERE pt.name IN ('some', 'tags', 'here') GROUP BY p.id ORDER BY p.created LIMIT 30 SELECT p.*, GROUP_CONCAT(pt.name) AS tags FROM products30 p LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id) LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id) GROUP BY p.id ORDER BY p.created (I used a temp table because you said "no subqueries"; I don't know if they're any easier to use in an Active Record framework, but at least it's another way to do it.) Ps. One really off-the-wall idea about your original problem: would it make any difference if you changed the GROUP BY p.id clause to GROUP BY p.created, p.id? Probably not, but I'd at least try it.