Optimizing MySQL ORDER BY field of JOINed table - mysql

I have the following structure:
CREATE TABLE `tbl1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tid` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`time` datetime NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `tid_uid` (`tid`,`user_id`),
KEY `tid_time` (`tid`,`time`)
)
CREATE TABLE `tbl2` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned NOT NULL,
`field_to_order_by` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`),
KEY `field_to_order_by` (`field_to_order_by`)
)
I'm trying to perform the following query:
SELECT * FROM tbl1
LEFT JOIN tbl2 ON tbl1.user_id=tbl2.user_id
WHERE tbl1.tid=13 AND time > DATE_SUB(NOW(), INTERVAL 1 WEEK)
ORDER BY tbl2.field_to_order_by DESC LIMIT 200
The performance problem I'm facing is due to the ORDER BY of the joined table field. If I remove that or even replace it with a WHERE condition on the same field I'm getting a massive improvement.
How/Can I achieve reasonable performance with this combination of JOIN and ORDER BY or can this only be solved with de-normalization?
This is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tbl1 range tid_uid,tid_time tid_time 12 NULL 221664 Using where; Using temporary; Using filesort
1 SIMPLE tbl2 eq_ref user_id user_id 4 tbl1.user_id 1

Related

MySQL migration from 5.5 to 5.7 extremely slow

As mentioned in the title, I'm migrating from 5.5 to 5.7, and I have 1 query that totally broke:
SELECT
COUNT(*) as count,
detections.vendor
FROM (
SELECT id FROM telemetry_user_agents
WHERE date < DATE_SUB(NOW(), INTERVAL 0 DAY)
AND date > DATE_SUB(NOW(), INTERVAL 7 DAY)
) agents
LEFT JOIN telemetry_detections detections on detections.user_id = agents.id
GROUP BY detections.vendor
ORDER BY count DESC
LIMIT 20
On 5.5 the query takes 4 seconds, on 5.7 it takes 400 seconds (!), with exact same indexes and data. I've read about bugs in 5.7, but I'm pretty sure I can find a workaround with some query optimization and/or better indexes.
user_agents has 200k records, detections has 900k records
Here's the structures:
CREATE TABLE `telemetry_user_agents` (
`id` int(11) NOT NULL,
`date` datetime NOT NULL,
`program` text NOT NULL,
`country` text NOT NULL,
`ip` text NOT NULL,
`operating_system` text NOT NULL,
`bits` int(11) NOT NULL,
`version` text NOT NULL,
`license` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `telemetry_user_agents`
ADD PRIMARY KEY (`id`),
ADD KEY `date` (`date`) USING BTREE;
ALTER TABLE `telemetry_user_agents`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
CREATE TABLE `telemetry_detections` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`type` text NOT NULL,
`vendor` text NOT NULL,
`name` text NOT NULL,
`value` text NOT NULL,
`hash` text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
ALTER TABLE `telemetry_detections`
ADD PRIMARY KEY (`id`),
ADD KEY `user_id` (`user_id`);
ALTER TABLE `telemetry_detections`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
And the explains:
5.5
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 95286 Using temporary; Using filesort
1 PRIMARY detections ref user_id user_id 4 agents.id 19
2 DERIVED telemetry_user_agents ALL date NULL NULL NULL 202047 Using where
5.7
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE telemetry_user_agents NULL ALL date NULL NULL NULL 200866 46.50 Using where; Using temporary; Using filesort
1 SIMPLE detections NULL ref user_id user_id 4 adlice_stats.telemetry_user_agents.id 19 100.00 NULL
Any help? :)
EDIT: After I created a bunch of indexes:
5.7:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE telemetry_user_agents NULL range date,date_2 date_2 5 NULL 131741 100.00 Using where; Using index; Using temporary; Using f...
1 SIMPLE detections NULL ref user_id user_id 4 adlice_stats.telemetry_user_agents.id 19 100.00 NULL

Avoid filesort with INNER JOIN + ORDER BY

I've been reading other posts but I didn't managed to fix my query.
Using DESC order the query is x20 times slower, I must improve that.
This is the query:
SELECT posts.post_id, posts.post_b_id, posts.post_title, posts.post_cont, posts.thumb, posts.post_user, boards.board_title_l, boards.board_title
FROM posts
INNER JOIN follow ON posts.post_b_id = follow.board_id
INNER JOIN boards ON posts.post_b_id = boards.board_id
WHERE follow.user_id =1
ORDER BY posts.post_id DESC
LIMIT 10
And these are the tables (Updated):
CREATE TABLE IF NOT EXISTS `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`post_b_id` int(11) unsigned NOT NULL,
`post_title` varchar(50) COLLATE utf8_bin NOT NULL,
`post_cont` text COLLATE utf8_bin NOT NULL,
`post_mintxt` varchar(255) COLLATE utf8_bin NOT NULL,
`post_type` char(3) COLLATE utf8_bin NOT NULL,
`thumb` varchar(200) COLLATE utf8_bin NOT NULL,
`post_user` varchar(16) COLLATE utf8_bin NOT NULL,
`published` enum('0','1') COLLATE utf8_bin NOT NULL,
`post_ip` varchar(94) COLLATE utf8_bin NOT NULL,
`post_ip_dat` int(11) unsigned NOT NULL,
`post_up` int(10) unsigned NOT NULL DEFAULT '0',
`post_down` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`post_id`),
KEY `post_b_id` (`post_b_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=405 ;
CREATE TABLE IF NOT EXISTS `boards` (
`board_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`board_title_l` varchar(19) COLLATE utf8_bin NOT NULL,
`board_user_id` int(10) unsigned NOT NULL,
`board_title` varchar(19) COLLATE utf8_bin NOT NULL,
`board_user` varchar(16) COLLATE utf8_bin NOT NULL,
`board_txt` tinyint(1) unsigned NOT NULL,
`board_img` tinyint(1) unsigned NOT NULL,
`board_vid` tinyint(1) unsigned NOT NULL,
`board_desc` varchar(100) COLLATE utf8_bin NOT NULL,
`board_mod_p` tinyint(3) unsigned NOT NULL DEFAULT '0',
`board_ip` varchar(94) COLLATE utf8_bin NOT NULL,
`board_dat_ip` int(11) unsigned NOT NULL,
PRIMARY KEY (`board_id`),
UNIQUE KEY `board_title_l` (`board_title_l`),
KEY `board_user_id` (`board_user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=89 ;
CREATE TABLE IF NOT EXISTS `follow` (
`user_id` int(10) unsigned NOT NULL,
`board_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`user_id`,`board_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Using default ASC order it only uses index and where, with DESC uses index, where, temporary and filesort.
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE follow ref user_id user_id 4 const 2 100.00 Using index; Using temporary; Using filesort
1 SIMPLE boards eq_ref PRIMARY PRIMARY 4 xxxx.follow.board_id 1 100.00
1 SIMPLE posts ref post_b_id post_b_id 4 xxxx.boards.board_id 3 100.00 Using where
How I can make the query receiving the results in DESC order without filesort and temporary.
UPDATE: I made a new query, no temporary or filesort, but type: index, filtered: 7340.00. Almost as fast as ASC order if the posts are at the end of the table, but slow if the posts that is searching are at the beginning. So seems better but it's not enough.
SELECT posts.post_id, posts.post_b_id, posts.post_title, posts.post_cont, posts.thumb, posts.post_user, boards.board_title_l, boards.board_title
FROM posts INNER JOIN boards ON posts.post_b_id = boards.board_id
WHERE posts.post_b_id
IN (
SELECT follow.board_id
FROM follow
WHERE follow.user_id = 1
)
ORDER BY posts.post_id DESC
LIMIT 10
Explain:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY posts index post_b_id PRIMARY 8 NULL 10 7340.00 Using where
1 PRIMARY boards eq_ref PRIMARY PRIMARY 4 xxxx.posts.post_b_id 1 100.00
2 DEPENDENT SUBQUERY follow eq_ref user_id user_id 8 const,func 1 100.00 Using index
UPDATE: Explain for the query from dened's answer:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2>ALL NULL NULL NULL NULL 10 100.00
1 PRIMARY posts eq_ref PRIMARY,post_b_id PRIMARY 4 sq.post_id 1 100.00
1 PRIMARY boards eq_ref PRIMARY PRIMARY 4 xxxx.posts.post_b_id 1 100.00
2 DERIVED follow ref PRIMARY PRIMARY 4 1 100.00 Using index; Using temporary; Using filesort
2 DERIVED posts ref post_b_id post_b_id 4 xxxx.follow.board_id 6 100.00 Using index
Times:
Original query no order (ASC): 0.187500 seconds
Original query DESC: 2.812500 seconds
Second query posts at the end (DESC): 0.218750 seconds
Second query posts at the beginning (DESC): 3.293750 seconds
dened's query DESC: 0.421875 seconds
dened's query no order (ASC): 0.323750 seconds
Interesting note, if I add ORDER BY ASC is as slow as DESC.
Alter the table order will be a god way, but as I said in the comments I wasn't able to do that.
You can help MySQL optimizer by moving all the filtering work to a subquery that accesses only indices (manipulating indices is usually much faster than manipulating other data), and fetching rest of the data in the outermost query:
SELECT posts.post_id,
posts.post_b_id,
posts.post_title,
posts.post_cont,
posts.thumb,
posts.post_user,
boards.board_title_l,
boards.board_title
FROM (SELECT post_id
FROM posts
JOIN follow
ON posts.post_b_id = follow.board_id
WHERE follow.user_id = 1
ORDER BY post_id DESC
LIMIT 10) sq
JOIN posts
ON posts.post_id = sq.post_id
JOIN boards
ON boards.board_id = posts.post_b_id
Note that I omit ORDER BY posts.post_id DESC from the outer query, because it is usually faster to sort the final result in your code rather than sorting using a MySQL query (MySQL often uses filesort for that).
P.S. You can replace the unique key in the follow table with a primary key.
Increasing the sort_buffer_size parameter will increase the amount of memory MySQL uses before resorting to a temporary disk file and should help considerably.

Slow query when using GROUP BY

I have the following query:
SELECT id, user_id, cookieId, text_date
FROM `_history`
WHERE text_date BETWEEN '2014-09-01' AND '2014-10-01' AND user_id = 1
GROUP BY cookieId
ORDER BY id DESC
My table schema:
CREATE TABLE `_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`cookieId` varchar(50) NOT NULL,
`text_from` varchar(50) NOT NULL,
`text_body` text NOT NULL,
`text_date` datetime NOT NULL,
`aName` varchar(50) NOT NULL,
`hasArrived` enum('0','1') NOT NULL COMMENT,
`agent_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `cookieId` (`cookieId`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
An EXPLAN yields this:
1 SIMPLE _history ref cookieId,user_id user_id 4 const 49837 Using where; Using temporary; Using filesort
Sometimes the query takes 2 seconds and sometimes its up to 5s.
Any ideas how to make this run faster?
The group by does nothing at the moment so drop it.
The user_id already has an index on it, so the query and sort on it are fine.
The text_date has no index on it, adding an index on it should speed up your query.
If this query occurs often, add a composite index on both user_id and text_date.
eg.
create index idx_text_date on `_history` (text_date);
Based on the comments, the query should look like this:
SELECT cookieId, max(id), max(user_id), max(text_date)
FROM `_history`
WHERE text_date BETWEEN '2014-09-01' AND '2014-10-01'
AND user_id = 1
GROUP BY cookieId
ORDER BY id DESC
And the index should look like this:
create index idx__history_text_date_cookieId on `_history` (text_date, cookieId);
Create a composite index on (user_id, cookieId).
CREATE TABLE `_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`cookieId` varchar(50) NOT NULL,
`text_from` varchar(50) NOT NULL,
`text_body` text NOT NULL,
`text_date` datetime NOT NULL,
`aName` varchar(50) NOT NULL,
`hasArrived` enum('0','1') NOT NULL COMMENT,
`agent_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `cookieId` (`cookieId`),
KEY `user_id_X_cookieId` (`user_id`, `cookieId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It will then be able to use the user_id index to find the rows, and use the cookieId suffix of that index to group them.
When you have this index, you don't need the user_id index, because a prefix of a composite index can be used as an index.

mySQL query is very slow after using DISTINCT and GROUP BY?

I have tables with following structure:
-- Table structure for table `temp_app`
--
CREATE TABLE IF NOT EXISTS `temp_app` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`vid` int(5) NOT NULL,
`num` varchar(64) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `num` (`num`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=69509;
-- Table structure for table `inv_flags`
--
CREATE TABLE IF NOT EXISTS `inv_flags` (
`num` varchar(64) NOT NULL,
`vid` int(11) NOT NULL,
`f_special` tinyint(1) NOT NULL, /*0 or 1*/
`f_inserted` tinyint(1) NOT NULL, /*0 or 1*/
`f_notinserted` tinyint(1) NOT NULL, /*0 or 1*/
`userID` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `num` (`num`),
KEY `userID` (`userID`),
KEY `vid` (`vid`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Execution time of the following query is 9 seconds to display 30 records. What is wrong?
SELECT date_format(ifs.`timestamp`,'%y/%m/%d') as `date`
,count(DISTINCT ta.num) as inserted /*Unique nums*/
,SUM(ifs.f_notinserted) as not_inserted
,SUM(ifs.f_special) as special
,count(ta.num) as links /*All nums*/
from inventory_flags ifs
LEFT JOIN temp_app ta ON ta.num = ifs.num AND ta.vid = ifs.vid
WHERE ifs.userID = 3
GROUP BY date(ifs.`timestamp`) DESC LIMIT 30
EXPLAIN RESULT
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ifs ref userID userID 4 const 12153 Using where
1 SIMPLE ta ref vid,num num 194 ifs.num 1
COUNT DISTINCT can sometimes cause rotten performance with MySql. Try this instead:
select count(*) from (select distinct...
as it can sometimes prevent MySql from writing the entire interim result to disk.
Here is the MySql bug info:
http://bugs.mysql.com/bug.php?id=21849

Why is MySQL using filesort in this case?

Table Structure:
CREATE TABLE IF NOT EXISTS `newsletters`
(
`id` int(11) NOT NULL auto_increment,
`last_update` int(11) default NULL,
`status` int(11) default '0',
`message_id` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `status` (`status`),
KEY `message_id` (`message_id`),
KEY `last_update` (`last_update`)
)
ENGINE=MyISAM DEFAULT CHARSET=latin1;
The Query:
SELECT id, last_update
FROM newsletters
WHERE status = 1
ORDER BY last_update DESC
LIMIT 0, 100
newsletters table has over 3 million records
query takes over 26 seconds to execute
Query explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE newsletters range status status 5 NULL 3043354 Using where; Using filesort
So why is it not using filesort, and how is it a range query?
It's using filesort to sort on last_update. You can avoid the filesort that by changing the index to status, last_update, so MySQL finds all rows with status 1 in the right order.
To further optimize, change the index to status, last_update, id. That allows MySQL to satisfy the query just by looking at the index, without a table lookup.
CREATE INDEX idx_newsletters_status
ON newsletters(status, last_update, id);