MySQL: Why is this SQL-query not using index? - mysql

I have a very simple SELECT that resorts to filesort and does not use index.
Consider the following query:
SELECT * FROM forum_topic
WHERE topic_status = 0
ORDER BY modified_date LIMIT 0, 30
on the following table (stripped of a few columns to make it more brief here)
CREATE TABLE `forum_topic` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`slug` varchar(255) NOT NULL,
`forum_id` int(10) NOT NULL DEFAULT '1',
`title` varchar(100) NOT NULL,
`topic_status` tinyint(1) NOT NULL DEFAULT '0',
`post_count` bigint(20) NOT NULL DEFAULT '0',
`modified_date` datetime NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `slug` (`slug`),
FULLTEXT KEY `title` (`title`),
KEY `modified` (`modified_date`, `topic_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
EXPLAIN gives the following output
id select_type table? partitions? type? possible_keys? key? key_len? ref? rows? Extra?
1 SIMPLE forum_topic NULL ALL NULL NULL NULL NULL 2075 Using where; Using filesort
Notice how the explain says there are NULL for possible_keys and how it's using filesort after having scanned ALL rows.
Please advice. Thanks.

This query needs topic_status to appear in the most significant position of an index, because it's searching on a constant.
You have
KEY `modified` (`modified_date`, `topic_status`)
and you may want
KEY `mod2` (`topic_status`, `modified_date` )
instead. This may satisfy both the filter and the ORDER BY ... LIMIT part of the query.
Pro tip: Avoid SELECT * and enumerate the columns you actually need instead.
Pro tip: Filesort doesn't necessarily mean what you think it means. It's used anytime MySQL needs to construct an intermediate result set for such things as sorting.

Related

Slow query due to timestamp order by, can't figure out how to fix

I have a table (logs) holding approximately 100k rows. Each row has a timestamp associated with when it was created. When I sort by this timestamp, even with numerous WHERE criteria, the query is much slower than without a sort. I can't seem to find a way to speed it up. I've tried all kinds of indexes.
The query is returning about 25k rows. I have similar queries that need to be run, with slightly different WHERE criteria.
With the ORDER BY, the query takes 0.6 seconds. Without the ORDER BY, the query takes 0.003 seconds.
The table structure is as follows.
CREATE TABLE IF NOT EXISTS `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`shipment_id` int(11) DEFAULT NULL,
`time` timestamp NULL DEFAULT NULL,
`initials` varchar(50) DEFAULT NULL,
`result` int(11) DEFAULT NULL,
`information` int(11) DEFAULT NULL,
`issues` varchar(5) DEFAULT NULL,
`fw_actions` varchar(999) DEFAULT NULL,
`noncompliant` tinyint(4) DEFAULT NULL,
`noncompliant_lead_initials` varchar(50) DEFAULT NULL,
`noncompliant_lead_time` varchar(20) DEFAULT NULL,
`event_id` int(11) DEFAULT NULL,
`action_id` int(11) DEFAULT NULL,
`resolution_id` int(11) DEFAULT NULL,
`noncompliant_reviewed` tinyint(4) NOT NULL DEFAULT '0',
`violation` tinyint(4) DEFAULT NULL,
`approved` tinyint(4) NOT NULL DEFAULT '0',
`approved_time` timestamp NULL DEFAULT NULL,
`approver` int(11) DEFAULT NULL,
`reviewed` tinyint(4) NOT NULL DEFAULT '0',
`reviewed_time` timestamp NULL DEFAULT NULL,
`reviewer` int(11) DEFAULT NULL,
`editor` int(11) DEFAULT NULL,
`summary` varchar(999) DEFAULT NULL,
`updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `LOGS_SHIPMENT_ID_TIME` (`shipment_id`,`time`,`action_id`),
KEY `SHIPMENT_ID_IDX` (`shipment_id`),
KEY `logs_updated_index` (`updated`),
KEY `violation_idx` (`violation`,`approved`,`reviewed`,`shipment_id`,`time`,`reviewer`,`approver`,`editor`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=100022 ;
The query is
SELECT * FROM logs
WHERE (logs.approved != 1) AND (logs.violation = 1)
ORDER BY logs.`time` DESC
My EXPLAIN looks like this
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE logs ref violation_idx violation_idx 2 const 1000 Using index condition; Using where; Using filesort
Anyone have a trick here? Thanks!
The key_len column says that MySQL is only using 2 bytes of the index "violation_idx". So it's only using the first two columns, "violation" and "approved", each of which is a tinyint (one byte).
You might be able to improve the performance of this query by making "time" the third column in this index. Currently, it's the fifth column. I don't know what other queries you're doing; this kind of change might hurt performance in other queries.
Also, you might be able to improve the performance by creating an additional index on the "time" column alone. Both those things are worth testing.
Most dbms will benefit from an index that has a descending sort on "time", but MySQL won't.
An index_col_name specification can end with ASC or DESC. These
keywords are permitted for future extensions for specifying ascending
or descending index value storage. Currently, they are parsed but
ignored; index values are always stored in ascending order.
You'll have to find your own comfort level with that. Today, creating an index "DESC" expresses your intent clearly, but a future upgrade to MySQL that starts parsing and implementing that expression might hurt performance for other queries.
Create an index on just time. Further indexes can aid with additional filters in the direction of your time index.

MySQL select with where takes a long time

I have a table with about 700.000 rows:
CREATE TABLE IF NOT EXISTS `ext_log_entries` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`action` varchar(8) NOT NULL,
`logged_at` datetime NOT NULL,
`object_id` varchar(32) DEFAULT NULL,
`object_class` varchar(255) NOT NULL,
`version` int(11) NOT NULL,
`data` longtext COMMENT '(DC2Type:array)',
`username` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `log_date_lookup_idx` (`logged_at`),
KEY `log_user_lookup_idx` (`username`),
KEY `log_class_lookup_idx` (`object_class`),
KEY `log_version_lookup_idx` (`object_id`,`object_class`,`version`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1219777 ;
I try to run the following query:
SELECT n0_.id AS id0, n0_.action AS action1, n0_.logged_at AS logged_at2, n0_.object_id AS object_id3, n0_.object_class AS object_class4, n0_.version AS version5, n0_.data AS data6, n0_.username AS username7
FROM ext_log_entries n0_
WHERE n0_.object_id =275634
AND n0_.object_class = 'My\\MyBundle\\Entity\\Field'
AND n0_.version <=1
ORDER BY n0_.version ASC
Here is the MySQL plan:
id 1
select_type SIMPLE
table n0_
type ref
possible_keys log_class_lookup_idx,log_version_lookup_idx
key log_class_lookup_idx
key_len 767
ref const
rows 641159
Extra Using where; Using filesort
My query need about 37 seconds to be executed for only 1 row in the result...
I tried to run the same query by deleting my indexes and it goes a little bit faster : about 31 seconds...
I don't understand why my query is taking so much time and why my indexes don't help the performance? Do you know how I can do to have good performance on this query?
Thanks in advance for your help !
EDIT
Here are the cardinalties of the indexes
log_date_lookup_idx BTREE logged_at 1221578 A
log_user_lookup_idx BTREE username 40 A YES
log_class_lookup_idx BTREE object_class 1010 A
log_version_lookup_idx BTREE object_id 1221578 A YES
object_class 1221578 A
version 1221578 A
I found a solution, not THE solution, but at least it works for me.
I think it could help anyway all people who are using gedmo loggable and who are lucky (like me) to have objects with only integers IDs.
I changes my column object_id to integer instead of varchar(255). My query now take 0.008 second ! It works for me because i'm sure i'll always have only integers, for people who have varchar, I'm sorry i tried many things but nothing worked....
CREATE TABLE IF NOT EXISTS `ext_log_entries` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`action` varchar(8) NOT NULL,
`logged_at` datetime NOT NULL,
`object_id` int(11) DEFAULT NULL,
`object_class` varchar(255) NOT NULL,
`version` int(11) NOT NULL,
`data` longtext COMMENT '(DC2Type:array)',
`username` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `log_date_lookup_idx` (`logged_at`),
KEY `log_user_lookup_idx` (`username`),
KEY `log_class_lookup_idx` (`object_class`),
KEY `log_version_lookup_idx` (`object_id`,`object_class`,`version`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1219777 ;

Very slow query when using ORDER BY and LIMIT?

The following query takes 10 seconds to finish when having order by. Without order by it finish in 0.0005 seconds. I am already having an index on field "sku", "vid" AND "timestamp". I have more 200,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT i.pn,i.sku,i.title, fl.f_inserted,fl.f_special, fl.f_notinserted
FROM inventory i
LEFT JOIN inventory_flags fl ON fl.sku = i.sku AND fl.vid = i.vid
WHERE i.qty >=2 ORDER BY i.timestamp LIMIT 0,100;
-- --------------------------------------------------------
--
-- Table structure for table `inventory`
--
CREATE TABLE IF NOT EXISTS `inventory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`pn` varchar(60) DEFAULT NULL,
`sku` varchar(60) DEFAULT NULL,
`title` varchar(60) DEFAULT NULL,
`qty` int(11) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `sku` (`sku`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `inventory_flags`
--
CREATE TABLE IF NOT EXISTS `inventory_flags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`f_inserted` tinyint(1) DEFAULT NULL,
`f_notinserted` tinyint(1) DEFAULT NULL,
`f_special` tinyint(1) DEFAULT NULL,
`timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`sku` varchar(60) DEFAULT NULL,
`vid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `vid` (`vid`),
KEY `sku` (`sku`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
EXPLANE RESULT:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fl system vid,sku NULL NULL NULL 0 const row not found
1 SIMPLE i index NULL timestamp 5 NULL 10 Using where
Instead of adding seprate indexes on columns you need to put multicolumn index on tables as you are using more than one columns from same table in joining condition.
after including columns from WHERE clause also include columns used in ORDER BY clause in composite index.
try adding flowing indexes and test them using EXPLAIN:
ALTER TABLE ADD INDEX ix_if inventory_flags(sku, vid);
ALTER TABLE ADD INDEX ix_i inventory(sku, qty, timestamp);
also try to avoid DISTINCT clause in your query, it is equivalent to GROUP BY clause, if you still need it then consider adding covering index.
If sku is unique to each inventory item then define it as UNIQUE - it'll speed things up. (Or the combination of sku and vid - define a composite index in that case.)
Why are you doing SELECT DISTINCT? The vast majority of the time using DISTINCT is a sign that your query or your table structure is wrong.
Since it's DISTINCT, and sku is not UNIQUE it can't use the index on timestamp to speed things up, so it has to sort a table with 200,000 records - it can't even use an index on qty to speed that part up.
PS. Omesh has some good advice as well.
you can use force index(index_key). try it, and you will see in explain query that mysql now will use the key index when 'order by'

MySQL Slow query: count articles, group by category, any way to optimize?

Is there any way to optimize this query? It takes more than 2.5 secs.
SELECT articles.categories_id,
COUNT(articles.id) AS count
FROM articles
WHERE articles.created >= DATE_SUB(NOW(), INTERVAL 48 HOUR)
GROUP BY articles.categories_id
ORDER BY count DESC
CREATE TABLE IF NOT EXISTS `articles` (
`id` int(11) NOT NULL,
`categories_id` int(11) NOT NULL,
`feeds_id` int(11) NOT NULL DEFAULT '0',
`users_id` int(11) NOT NULL DEFAULT '1',
`title` varchar(255) CHARACTER SET utf8 NOT NULL,
`sefriendly` varchar(255) CHARACTER SET utf8 NOT NULL,
`body` text CHARACTER SET utf8,
`source` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`created` datetime NOT NULL,
`edited` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`fingerprint` varchar(32) CHARACTER SET utf8 NOT NULL,
`type` int(1) NOT NULL DEFAULT '1' COMMENT '1 => Feed fetched article\n2 => User submitted article',
`description` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`keywords` text CHARACTER SET utf8,
`status` int(1) NOT NULL DEFAULT '0' COMMENT '0 => Passive\n1 => Active\n2 => Pending',
PRIMARY KEY (`id`),
KEY `categories_id` (`categories_id`) USING BTREE,
KEY `feeds_id` (`feeds_id`) USING BTREE,
KEY `users_id` (`users_id`) USING BTREE,
KEY `fingerprint` (`fingerprint`) USING BTREE,
KEY `title` (`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_turkish_ci ROW_FORMAT=COMPACT;
I already use caching, so there is no problem in terms of code.
This is the explain sql result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE articles index NULL categories_id 4 NULL 120411 Using where; Using temporary; Using file sort
Thanks.
You can improve things by adding an index on created. This will help serve your WHERE clause:
WHERE articles.created >= DATE_SUB(NOW(), INTERVAL 48 HOUR)
It will probably be advantageous to instead create a covering index on (created, categories_id) so that all the data required for the query is available in the index.
Note that the value of id is not needed for this query because COUNT only cares if the value is NULL or NOT NULL, but id is defined to be NOT NULL in your table definition. It would probably be a good idea to make this explicit by using COUNT(*) or COUNT(1) instead of COUNT(id) as this is guaranteed to give the same result. But I would expect that MySQL is intelligent enough to make this optimization for you automatically.
I don't think that you can avoid the file sort because you are sorting on the result of an aggregation, and this cannot be indexed.

optimize query (2 simple left joins)

SELECT fcat.id,fcat.title,fcat.description,
count(DISTINCT ftopic.id) as number_topics,
count(DISTINCT fpost.id) as number_posts FROM fcat
LEFT JOIN ftopic ON fcat.id=ftopic.cat_id
LEFT JOIN fpost ON ftopic.id=fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
index on ftopic_cat_id, fpost.topic_id, fcat.ord
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fcat ALL PRIMARY NULL NULL NULL 11 Using temporary; Using filesort
1 SIMPLE ftopic ref PRIMARY,cat_id_2 cat_id_2 4 bloki.fcat.id 72
1 SIMPLE fpost ref topic_id_2 topic_id_2 4 bloki.ftopic.id 245
fcat - 11 rows,
ftopic - 1106 rows,
fpost - 363000 rows
Query takes 4,2 sec
TABLES:
CREATE TABLE IF NOT EXISTS `fcat` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(250) collate utf8_unicode_ci NOT NULL,
`description` varchar(250) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`ord` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ord` (`ord`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=12 ;
CREATE TABLE IF NOT EXISTS `ftopic` (
`id` int(11) NOT NULL auto_increment,
`cat_id` int(11) NOT NULL,
`title` varchar(100) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP,
`lastname` varchar(200) collate utf8_unicode_ci NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`closed` tinyint(4) NOT NULL default '0',
`views` int(11) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `cat_id_2` (`cat_id`,`updated`,`visible`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1116 ;
CREATE TABLE IF NOT EXISTS `fpost` (
`id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
`pet_id` int(11) NOT NULL,
`content` text collate utf8_unicode_ci NOT NULL,
`imageName` varchar(300) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`reply_id` int(11) NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`md5` varchar(100) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `topic_id_2` (`topic_id`,`created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=390971 ;
Thanks,
hamlet
you need to create a key with both fcat.id, fcat.ord
Bold rewrite
This code is not functionally identical, but...
Because you want to know about distinct ftopic.id and fpost.id I'm going to be bold and suggest two INNER JOIN's instead of LEFT JOIN's.
Then because the two id's are autoincrementing they will no longer repeat, so you can drop the distinct.
SELECT
fcat.id
, fcat.title
, fcat.description
, count(ftopic.id) as number_topics
, count(fpost.id) as number_posts
FROM fcat
INNER JOIN ftopic ON fcat.id = ftopic.cat_id
INNER JOIN fpost ON ftopic.id = fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
It depends on your data if this is what you are looking for, but I'm guessing it will be faster.
All your indexes seem to be in order though.
MySQL does not use indexes for small sample sizes!
Note that the explain list that MySQL only has 11 rows to consider for fcat. This is not enough for MySQL to really start worrying about indexes, so it doesn't.
Because going to the index for small row-counts slows things down.
MySQL is trying to speed things up so it chooses not to use the index, this confuses a lot of people because we are trained so hard on the index. Small sample sizes don't give good explains!
Increase the size of the test data so MySQL has more rows to consider and you should start seeing the index being used.
Common misconceptions about force index
Force index does not force MySQL to use an index as such.
It hints at MySQL to use a different index from the one it might naturally use and it pushes MySQL into using an index by setting a very high cost on a table scan.
(In your case MySQL is not using a table scan, so force index has no effect)
MySQL (same most other DBMS's on the planet) has a very strong urge to use indexes, so if it doesn't (use any) that's because using no index at all is faster.
How does MySQL know which index to use
One of the parameters the query optimizer uses is the stored cardinality of the indexes.
Over time these values change... But studying the table takes time, so MySQL doesn't do that unless you tell it to.
Another parameter that affects index selection is the predicted disk-seek-times that MySQL expects to encounter when performing the query.
Tips to improve index usage
ANALYZE TABLE will instruct MySQL to re-evaluate the indexes and update its key distribution (cardinality). (consider running it daily/weekly in a cron job)
SHOW INDEX FROM table will display the key distribution.
MyISAM tables and indexes fragment over time. Use OPTIMIZE TABLE to unfragment the tables and recreate the indexes.
FORCE/USE/IGNORE INDEX limits the options MySQL's query optimizer has to perform your query. Only consider it on complex queries.
Time the effect of your meddling with indexes on a regular basis. A forced index that speeds up your query today might slow it down tomorrow because the underlying data has changed.