MySQL is not using index on very simple GROUP BY query - mysql

Here is my table create/schema:
CREATE TABLE `_test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(10) unsigned NOT NULL,
`total` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `group_id` (`group_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
So 2 keys - PRIMARY on id and INDEX on group_id.
and here is a data sample:
Now, the thing is that when I run EXPLAIN for the following simple query:
EXPLAIN SELECT SUM( total ) FROM _test GROUP BY (group_id)
I am getting no use of any key despite one being clearly created on group_id column:
any ideas why MySQL is not trying to use group_id index for that query?

Related

Slow query when using GROUP BY

I have the following query:
SELECT id, user_id, cookieId, text_date
FROM `_history`
WHERE text_date BETWEEN '2014-09-01' AND '2014-10-01' AND user_id = 1
GROUP BY cookieId
ORDER BY id DESC
My table schema:
CREATE TABLE `_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`cookieId` varchar(50) NOT NULL,
`text_from` varchar(50) NOT NULL,
`text_body` text NOT NULL,
`text_date` datetime NOT NULL,
`aName` varchar(50) NOT NULL,
`hasArrived` enum('0','1') NOT NULL COMMENT,
`agent_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `cookieId` (`cookieId`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
An EXPLAN yields this:
1 SIMPLE _history ref cookieId,user_id user_id 4 const 49837 Using where; Using temporary; Using filesort
Sometimes the query takes 2 seconds and sometimes its up to 5s.
Any ideas how to make this run faster?
The group by does nothing at the moment so drop it.
The user_id already has an index on it, so the query and sort on it are fine.
The text_date has no index on it, adding an index on it should speed up your query.
If this query occurs often, add a composite index on both user_id and text_date.
eg.
create index idx_text_date on `_history` (text_date);
Based on the comments, the query should look like this:
SELECT cookieId, max(id), max(user_id), max(text_date)
FROM `_history`
WHERE text_date BETWEEN '2014-09-01' AND '2014-10-01'
AND user_id = 1
GROUP BY cookieId
ORDER BY id DESC
And the index should look like this:
create index idx__history_text_date_cookieId on `_history` (text_date, cookieId);
Create a composite index on (user_id, cookieId).
CREATE TABLE `_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`cookieId` varchar(50) NOT NULL,
`text_from` varchar(50) NOT NULL,
`text_body` text NOT NULL,
`text_date` datetime NOT NULL,
`aName` varchar(50) NOT NULL,
`hasArrived` enum('0','1') NOT NULL COMMENT,
`agent_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `cookieId` (`cookieId`),
KEY `user_id_X_cookieId` (`user_id`, `cookieId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It will then be able to use the user_id index to find the rows, and use the cookieId suffix of that index to group them.
When you have this index, you don't need the user_id index, because a prefix of a composite index can be used as an index.

mySQL query is very slow after using DISTINCT and GROUP BY?

I have tables with following structure:
-- Table structure for table `temp_app`
--
CREATE TABLE IF NOT EXISTS `temp_app` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`vid` int(5) NOT NULL,
`num` varchar(64) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `num` (`num`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=69509;
-- Table structure for table `inv_flags`
--
CREATE TABLE IF NOT EXISTS `inv_flags` (
`num` varchar(64) NOT NULL,
`vid` int(11) NOT NULL,
`f_special` tinyint(1) NOT NULL, /*0 or 1*/
`f_inserted` tinyint(1) NOT NULL, /*0 or 1*/
`f_notinserted` tinyint(1) NOT NULL, /*0 or 1*/
`userID` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `num` (`num`),
KEY `userID` (`userID`),
KEY `vid` (`vid`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Execution time of the following query is 9 seconds to display 30 records. What is wrong?
SELECT date_format(ifs.`timestamp`,'%y/%m/%d') as `date`
,count(DISTINCT ta.num) as inserted /*Unique nums*/
,SUM(ifs.f_notinserted) as not_inserted
,SUM(ifs.f_special) as special
,count(ta.num) as links /*All nums*/
from inventory_flags ifs
LEFT JOIN temp_app ta ON ta.num = ifs.num AND ta.vid = ifs.vid
WHERE ifs.userID = 3
GROUP BY date(ifs.`timestamp`) DESC LIMIT 30
EXPLAIN RESULT
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ifs ref userID userID 4 const 12153 Using where
1 SIMPLE ta ref vid,num num 194 ifs.num 1
COUNT DISTINCT can sometimes cause rotten performance with MySql. Try this instead:
select count(*) from (select distinct...
as it can sometimes prevent MySql from writing the entire interim result to disk.
Here is the MySql bug info:
http://bugs.mysql.com/bug.php?id=21849

SQL: Refactoring a multi-join query

I have a query that should be quite simple and yet it causes me a lot of headaches.
I have a simple ads system that requires filtering ads according to a few variables.
I need to limit the number of views/clicks per day and the total number of views/clicks for a given ad. Also each ad is linked to one or more slots in which the ad can appear. I have a table that saves the statistics that I need about each ad. Note that the statistics table changes very frequently.
These are the tables that I'm using:
CREATE TABLE `t_ads` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`content` text NOT NULL,
`is_active` tinyint(1) unsigned NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`max_views` int(10) unsigned NOT NULL,
`type` tinyint(3) unsigned NOT NULL default '0',
`refresh` smallint(5) unsigned NOT NULL default '0',
`max_clicks` int(10) unsigned NOT NULL,
`max_daily_clicks` int(10) unsigned NOT NULL,
`max_daily_views` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ad_slots` (
`id` int(10) unsigned NOT NULL auto_increment ,
`name` varchar(255) NOT NULL,
`width` int(10) unsigned NOT NULL,
`height` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ads_to_slots` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`ad_id`,`slot_id`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ads_to_slots`
ADD CONSTRAINT `t_ads_to_slots_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ads_to_slots_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
CREATE TABLE `t_ad_stats` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`date` date NOT NULL COMMENT,
`views` int(10) unsigned NOT NULL,
`unique_views` int(10) unsigned NOT NULL,
`clicks` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`ad_id`,`slot_id`,`date`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ad_stats`
ADD CONSTRAINT `t_ad_stats_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ad_stats_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
This is the query that I use to get ads for a given slot (Note that in this example I hard coded 20 as the slot id and 0,1,2 as the ad type, I get this data from a php script which invokes this query)
SELECT `ads`.`content`, `slots`.`value`, `ads`.`id`, `ads`.`refresh`, `ads`.`type`,
SUM(`total_stats`.`views`) AS "total_views",
SUM(`total_stats`.`clicks`) AS "total_clicks"
FROM (`t_ads` AS `ads`,
`t_ads_to_slots` AS `slots`)
LEFT JOIN `t_ad_stats` AS `total_stats`
ON `total_stats`.`ad_id` = `ads`.`id`
LEFT JOIN `t_ad_stats` AS `daily_stats`
ON (`daily_stats`.`ad_id` = `ads`.`id`) AND
(`daily_stats`.`date` = CURDATE())
WHERE (`ads`.`id` = `slots`.`ad_id`) AND
(`ads`.`type` IN(0,1,2)) AND
(`slots`.`slot_id` = 20) AND
(`ads`.`is_active` = 1) AND
(`ads`.`end_date` >= NOW()) AND
(`ads`.`start_date` <= NOW()) AND
((`ads`.`max_views` = 0) OR
(`ads`.`max_views` > "total_views")) AND
((`ads`.`max_clicks` = 0) OR
(`ads`.`max_clicks` > "total_clicks")) AND
((`ads`.`max_daily_clicks` = 0) OR
(`ads`.`max_daily_clicks` > IFNULL(`daily_stats`.`clicks`,0))) AND
((`ads`.`max_daily_views` = 0) OR
(`ads`.`max_daily_views` > IFNULL(`daily_stats`.`views`,0)))
GROUP BY (`ads`.`id`)
I believe that this query is self explanatory, even though its quite long. Note that the MySQL version that I'm using is: 5.0.51a-community. It seems to me like the big issue here is the double join to the stats table (I did that so that I will be able to get the data from a specific record and from multiple records (sum)).
How would you implement this query in order to get better results? (Note that I can't change from InnoDB).
Hopefully everything is clear about my question, but if that is not the case, please ask and I will clarify.
Thanks in advance,
Kfir
Add indexes to following columns:
t_ads.is_active
t_ads.start_date
t_ads.end_date
Change the order of the primary key on t_ad_stats to:
(`ad_id`,`date`,`slot_id`)
or add a covering index to t_ad_stats
('ad_id', 'date')
Change from 0 meaning "no limit" to 2147483647 meaning no limit, so you can change things like:
((`ads`.`max_views` = 0) OR (`ads`.`max_views` > "total_views"))
to
(`ads`.`max_views` > "total_views")
You could greatly improve this is if you were keeping running totals instead of having to calculate them each time.
Expanding on a comment above I believe that the following columns should be indexed:
ads.id
ads.type
ads.start_date
ads.end_date
daily_stats.date
As well as these:
slots.slot_id
ads.is_active
And these as well:
ads.max_views
ads.max_clicks
ads.max_daily_clicks
ads.max_daily_views
daily_stats.clicks
daily_stats.views
Do note that applying indexes on these columns will speed up your SELECTs but slow down your INSERTs since the indexes will need updating as well. But, you don't have to apply all of this all at once. You can do it incrementally and see how the performance shakes out for selects as well as inserts. If you cannot find a good middleground then I would suggest denormalization.

Nested "select ... in" performance is slow - how to fix?

Here I have a simple join query. If first two queries get results, the whole query can be done in 0.3 secs, but if the first 2 select doesn't fetch any result, the whole query will cost more than half a minute. What causes this difference? How to fix this problem and improve the performance?
SELECT * FROM music WHERE id IN
(
SELECT id FROM music_tag_map WHERE tag_id IN
(
SELECT id FROM tag WHERE content ='xxx'
)
)
LIMIT 10
Here's the table structure:
CREATE TABLE `tag` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index2` (`content`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `music` (
`id` int(7) NOT NULL AUTO_INCREMENT,
`name` varchar(500) NOT NULL,
`othername` varchar(200) DEFAULT NULL,
`player` varchar(3000) DEFAULT NULL,
`genre` varchar(100) DEFAULT NULL,
`sounds` text,
`create_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `player` (`player`(255)),
KEY `name` (`othername`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `music_tag_map` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`music_id` int(7) NOT NULL,
`tag_id` int(7) NOT NULL,
`times` int(11) DEFAULT '1',
PRIMARY KEY (`id`),
KEY `music_id` (`music_id`),
KEY `tag_id` (`tag_id`),
CONSTRAINT `music_tag_map_ibfk_1` FOREIGN KEY (`id`) REFERENCES `music` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `music_tag_map_ibfk_2` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
There are no joins in that query; there are two sub-selects.
A joined query would be:
SELECT *
FROM music
JOIN music_tag_map ON music.id=music_tag_map.id
JOIN tag ON music_tag_map.tag_id=tag.id
WHERE tag.content = ?
LIMIT 10;
An EXPLAIN applied to each will show you why the join performs better than the sub-select: the sub-select will scan the entire music table (the primary query), while the optimizer can pick the order of tables to scan for the joins, allowing MySQL to use indices to get only the needed rows from all the tables.

MySQL ORDER BY optimization in many to many tables

Tables:
CREATE TABLE IF NOT EXISTS `posts` (
`post_n` int(10) NOT NULL auto_increment,
`id` int(10) default NULL,
`date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`post_n`,`visibility`),
KEY `id` (`id`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE IF NOT EXISTS `subscriptions` (
`subscription_n` int(10) NOT NULL auto_increment,
`id` int(10) NOT NULL,
`subscribe_id` int(10) NOT NULL,
PRIMARY KEY (`subscription_n`),
KEY `id` (`id`),
KEY `subscribe_id` (`subscribe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Query:
SELECT posts.* FROM posts, subscriptions
WHERE posts.id=subscriptions.subscribe_id AND subscriptions.id=1
ORDER BY date DESC LIMIT 0, 15
It`s so slow because used indexes "id", "subscribe_id" but not index "date" thus ordering is very slow.
Is there any options to change the query, indexes, architecture?
Possible Improvements:
First, you'll gain a couple microseconds per query if you name your fields instead of using SELECT posts.* which causes a schema lookup. Change your query to:
SELECT posts.post_n, posts.id, posts.date
FROM posts, subscriptions
WHERE posts.id=subscriptions.subscribe_id
AND subscriptions.id=1
ORDER BY date DESC
LIMIT 0, 15
Next, this requires MySQL 5.1 or higher, but you might want to consider partitioning your tables. You might consider KEY partitioning for both tables.
This should get you started.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-types.html
E.g.
SET SQL_MODE = 'ANSI';
-- to allow default date
CREATE TABLE IF NOT EXISTS `posts` (
`post_n` int(10) NOT NULL auto_increment,
`id` int(10) default NULL,
`date` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`post_n`,`id`),
KEY `id` (`id`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY KEY(id) PARTITIONS 32;
--
CREATE TABLE IF NOT EXISTS `subscriptions` (
`subscription_n` int(10) NOT NULL auto_increment,
`id` int(10) NOT NULL,
`subscribe_id` int(10) NOT NULL,
PRIMARY KEY (`subscription_n`,`subscribe_id`),
KEY `id` (`id`),
KEY `subscribe_id` (`subscribe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin
PARTITION BY KEY(subscribe_id) PARTITIONS 32;
I had to adjust your primary key a bit. So, beware, this may NOT work for you. Please test it and make sure. I hope, this does though. Make sure to run sysbench against the old and new structures/queries to compare results before going to production.
:-)
If you're able to modify the table, you could add a multi-field index containing both ID and date. (or modify one of the existing keys to contain them both).
If you can't make changes to the database, and if you know that your result set is going to be small, you can force it to use a specific named key, with USE KEY(name). The ordering would then be done after the fact, just on the reslts returned.
Hope that helps.