Mysql optimize slow query with explain - mysql

I'm working on MySQL 5.5.29-0ubuntu0.12.04.1.
I have the need to create a query that can sort results by date and by a score.
I read the documentation and the posts here on stackoverflow (specifically this) about how to optimize a query but I'm still struggling to do it well.
The key findings is that to avoid the use of a temporary table the ORDER BY or GROUP BY must contains only columns from the first table in the join queue, so that's why the use of the STRAIGHT_JOIN clause and the two slightly different queries.
To avoid confusion, I'm going to assign a number to various query configuration:
order by date with STRAIGHT_JOIN clause
order by score with STRAIGHT_JOIN clause
order by date without STRAIGHT_JOIN clause
order by score without STRAIGHT_JOIN clause
Following is query 1, takes about 2.5 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM item
INNER JOIN score ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY zen_time DESC
LIMIT 0, 10
Following is query 2 (first join tables are inverted and the ordering column is different), takes only about 0.01 seconds to complete:
SELECT STRAIGHT_JOIN item.id AS id
FROM score
INNER JOIN item ON item.id = score.item_id
LEFT JOIN url ON item.url_id = url.id
LEFT JOIN doc ON url.doc_id = doc.id
INNER JOIN feed ON feed.id = item.feed_id
INNER JOIN user_feed ON feed.id = user_feed.feed_id AND score.user_id = user_feed.user_id
LEFT JOIN star ON item.id = star.item_id AND score.user_id = star.user_id
JOIN unseen ON item.id = unseen.item_id AND score.user_id = unseen.user_id
WHERE score.user_id = 1 AND user_feed.id = 7
ORDER BY score DESC
LIMIT 0, 10
Following are the EXPLAIN results for the queries.
Explain for query 1:
Explain for query 2:
Explain for query 3:
Explain for query 4:
Profiler result for query 1:
Profiler result for query 2:
Profiler result for query 3:
Profiler result for query 4:
Following are tables definitions:
CREATE TABLE `doc` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`md5` char(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`title` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `Title_url_index` (`title`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` bigint(20) unsigned NOT NULL,
`url_id` bigint(20) unsigned DEFAULT NULL,
`md5` char(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `Md5_index` (`md5`),
KEY `Zen_time_index` (`zen_time`),
KEY `Feed_index` (`feed_id`),
KEY `Url_index` (`url_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `score` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
`score` float DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`),
KEY Score_index (`score`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `star` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `unseen` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`item_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `User_item_index` (`user_id`,`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `url` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`doc_id` bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY Doc_index (`doc_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_Email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user_feed` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned NOT NULL,
`feed_id` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `User_feed_index` (`user_id`,`feed_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here are the row counts for the tables involved in the query:
Score: 68657
Item: 197602
Url: 198354
Doc: 186113
Feed: 754
User_feed: 721
Star: 0
Unseen: 150762
Which approach should I take since my program needs to be able to order results both by zen_time and score in the fastest way possible?

Due to the different query speeds I decided to make an even more accurate analysis based on the various results I want to achieve.
The result sets I need are four:
Select all the items from a specific feed, order them by SCORE.score (intelligent order)
Select all the items from a specific feed, order them by ITEM.zen_time (time order)
Select all the items, order them by SCORE.score (intelligent order)
Select all the items, order them by ITEM.zen_time (time order)
The query so has to be adapted to those conditions, and its variable parts are:
STRAIGHT_JOIN yes/no
First JOIN table score/item
WHERE condition on specific feed yes/no
ORDER BY score/zen_time
All of the tests have been executed with the SELECT SQL_NO_CACHE instruction.
Following are the results:
Now it's clear what I have to do:
No STRAIGHT_JOIN, first JOIN table SCORE
No STRAIGHT_JOIN, first JOIN table SCORE
STRAIGHT_JOIN (I did beat MySQL engine here :D ), first JOIN table SCORE
STRAIGHT_JOIN (I did beat MySQL engine here :D ), first JOIN table ITEM

Related

I need to optimize tables and queries

I have 3 tables: info, data, link, there is a request for data:
select *
from data,link,info
where link.info_id = info.id and link.data_id = data.id
offer optimization options:
a) tables
b) request.
Queries for creating tables:
CREATE TABLE info (
id int(11) NOT NULL auto_increment,
name varchar(255) default NULL,
desc text default NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=cp1251;
CREATE TABLE data (
id int(11) NOT NULL auto_increment,
date date default NULL,
value INT(11) default NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=cp1251;
CREATE TABLE link (
data_id int(11) NOT NULL,
info_id int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=cp1251;
Thanks!
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax:
select *
from data d join
link l
on l.data_id = d.id join
info i
on l.info_id = i.id;
Second, for this query your indexes are probably fine. I would also recommend a primary key index on link:
CREATE TABLE link (
data_id int(11) NOT NULL,
info_id int(11) NOT NULL,
PRIMARY KEY (data_id, info_id)
);
This is a good idea in general, even if it is not specific to this query.

Mysql binary tree select query optimization

I have a accounts table. every account creation I am pushing treeRight id and treeLeft id into account_device_tree table.
Now I have more than 10M accounts under first parent account. when I select all the subaccouts it is taking more than a min to execute.
my query is given below
select *
FROM
accounts acc
JOIN
account_device_tree ON acc.tree_id = account_device_tree.tree_id
WHERE
(acc.account_id = 1 OR (account_device_tree.tree_left >= 1 AND account_device_tree.tree_right <= 748534))
I need to optimize as much as possible.
schema of account_device_tree
CREATE TABLE `account_device_tree` (
`tree_id` int(11) NOT NULL AUTO_INCREMENT,
`tree_left` int(11) NOT NULL,
`tree_right` int(11) NOT NULL,
PRIMARY KEY (`tree_id`),
KEY `tree_left` (`tree_left`),
KEY `tree_right` (`tree_right`)
) ENGINE=InnoDB AUTO_INCREMENT=388173 DEFAULT CHARSET=latin1
accounts table Schema
CREATE TABLE `accounts` (
`account_id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) NOT NULL,
`name` varchar(64) NOT NULL,
`tree_id` int(11) NOT NULL,
PRIMARY KEY (`account_id`),
UNIQUE KEY `tree_id` (`tree_id`),
KEY `name` (`name`),
KEY `idx_parent_id` (`parent_id`)
) ENGINE=InnoDB AUTO_INCREMENT=389739 DEFAULT CHARSET=latin1
Hard to suggest something without table structures and query plan..
But rewritting the query into UNION ALL instead of using OR tends to optimize beter assuming the correct indexes are in the table.
select *
FROM
accounts acc
JOIN
account_user_device_tree ON acc.tree_id = account_user_device_tree.tree_id
WHERE
acc.account_id = 1
UNION ALL
select *
FROM
accounts acc
JOIN
account_user_device_tree ON acc.tree_id = account_user_device_tree.tree_id
WHERE
account_user_device_tree.tree_left >= 1
AND
account_user_device_tree.tree_right <= 748534

How can I query for rows with latest date and do an inner join on a second table?

All the examples I've seen show how to do an inner join using an alias to get rows with the latest date. I can do that with my data but I also want to do an inner join on another table and can't figure how to do both with the same query.
Here are the two tables:
CREATE TABLE `titles` (
`titleID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`titlename` tinytext NOT NULL,
`url` varchar(255) DEFAULT '',
`category` int(2) unsigned NOT NULL,
`postdate` date NOT NULL,
PRIMARY KEY (`titleID`),
KEY `category` (`category`),
CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`category`) REFERENCES `categories` (`catid`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1;
CREATE TABLE `stats` (
`statid` int(11) unsigned NOT NULL AUTO_INCREMENT,
`score` decimal(3,2) DEFAULT NULL,
`views` int(11) unsigned DEFAULT NULL,
`favs` int(11) DEFAULT NULL,
`comments` int(11) DEFAULT NULL,
`updatedate` date NOT NULL,
`title` int(11) unsigned NOT NULL,
PRIMARY KEY (`statid`),
KEY `title` (`title`),
CONSTRAINT `stats_ibfk_1` FOREIGN KEY (`title`) REFERENCES `titles` (`titleID`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
My goals:
1) I want a query that gives me all the latest stats for each title.
2) I want to see the text name of the title (from the titles table).
I can use this query to get the latest score for each title.
select t.score, t.views, t.favs, t.comments, t.updatedate, t.title
from stats t
inner join (
select title, max(updatedate) as updatedate
from stats
GROUP BY title
) tm on t.title = tm.title and t.updatedate = tm.updatedate
But the problem with this query is that it displays the title column from stats which is an int. I want the text name of the title.
I can do this to get the title name and the score, but then I'm not getting the row with the latest date.
select titlename, score, updatedate
from stats
inner join titles
on titleid = title
How can I write a query that achieves both my goals?
You need to join the title table in this case as
select
s1.score,
s1.views,
s1.favs,
s1.comments,
s1.updatedate,
t.titlename
from titles t
join stats s1 on s1.title = t.titleID
join (
select title, max(updatedate) as updatedate
from stats
GROUP BY title
) s2 on s2.title = s1.title and s1.updatedate = s2.updatedate

performance issue when joining two large tables

I have a multilingual CMS that uses a translation table (70k rows) that contains all of the texts
CREATE TABLE IF NOT EXISTS `translations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`key` int(11) NOT NULL,
`lang` int(11) NOT NULL,
`value` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
KEY `key` (`key`,`lang`)
) ENGINE=MyISAM
and products table (4k rows) containing products with translation keys
CREATE TABLE IF NOT EXISTS `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name_trans_id` int(11) NOT NULL,
`desc_trans_id` int(11) DEFAULT NULL,
`text_trans_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name_trans_id`),
KEY `desc_index` (`desc_trans_id`),
KEY `text_index` (`text_trans_id`)
) ENGINE=MyISAM
now i need to get top 20 products in alphabetical order, to do that i use this query :
SELECT
SQL_CALC_FOUND_ROWS
dt_table.* ,
t_name.value as 'name'
FROM
products as dt_table
LEFT JOIN
`translations` as t_name on dt_table.name_trans_id = t_name.key
WHERE
(t_name.lang = 1 OR t_name.lang is null)
ORDER BY
name ASC LIMIT 0, 20
It takes forever.
Any help optimizing this query/tables will be appreciated.
Thank you.
Try to change your structure of translations table to:
CREATE TABLE IF NOT EXISTS `translations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`key` int(11) NOT NULL,
`lang` int(11) NOT NULL DEFAULT 0,
`value` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
KEY `lang` (`lang`),
KEY `key` (`key`,`lang`),
FULLTEXT idx (`value`)
) ENGINE=InnoDB;
because you really need lang to be indexed as soon as you use it in WHERE clause.
And try to change your query a little bit:
SELECT
dt_table.* ,
t_name.value as 'name',
SUBSTR(t_name.value,0,100) as text_order
FROM
products as dt_table
LEFT JOIN (
SELECT key, value FROM `translations`
WHERE lang = 1 OR lang is null
) as t_name
ON dt_table.name_trans_id = t_name.key
ORDER BY
text_order ASC LIMIT 0, 20
and if you really need SQL_CALC_FOUND_ROWS (I don't understand why do you need counter for translations items)
you can run another query just right after the first one:
SELECT COUNT(*) FROM products;
I am pretty sure you will be surprised with performance :-)

mysql gives NULL for record in a table if JOIN and COUNT used but SELECT works fine why?

Table 1
CREATE TABLE IF NOT EXISTS `com_msg` (
`msg_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`msg_to` int(10) NOT NULL,
`msg_from` int(10) NOT NULL,
`msg_new` tinyint(1) unsigned NOT NULL DEFAULT '1',
`msg_content` varchar(300) NOT NULL,
`msg_date` date NOT NULL,
`bl_sender` tinyint(1) unsigned NOT NULL DEFAULT '0',
`bl_recip` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`msg_id`),
UNIQUE KEY `msg_id` (`msg_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Table 2
CREATE TABLE IF NOT EXISTS `ac_vars` (
`user_id` int(10) unsigned NOT NULL,
`ac_ballance` smallint(3) unsigned NOT NULL DEFAULT '0',
`prof_views` mediumint(8) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`user_id`),
UNIQUE KEY `id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When I use query :
SELECT ac_ballance, prof_views, COUNT( msg_id ) AS messages
FROM ac_vars
INNER JOIN com_msg ON user_id = msg_to
WHERE user_id =".$userid." AND com_msg.msg_new =1;
I get :
ac_ballance=NULL(incorrect)
prof_views=NULL(incorrect)
messages=0(correct)
But with Select statement just on ac_vars I get correct values, what is the correct way of doing this?
You want rows from the table ac_vars even when there's no corresponding row in the table com_msg.
So you must use a LEFT JOIN:
SELECT ac_ballance, prof_views, COUNT( msg_id ) AS messages
FROM ac_vars
LEFT JOIN com_msg
ON user_id = msg_to AND com_msg.msg_new =1
WHERE user_id =".$userid.";
Please note that the condition
com_msg.msg_new =1
got to be a part of the JOIN condition and not the WHERE clause, because there's no value in com_msg that fulfills this condition.
Note
Adding
GROUP BY ac_ballance, prof_views
is not needed by MySQLs optimization because the values in those columns are directly dependent of the user_id and the WHERE clause permits only one single row.