MySQL: Difference between LIKE 123 and = 123 regarding INDEX usage - mysql

I am experiencing a very strange behaviour which just turned out to be a matter of using the correct operator in my where condition.
Assume the following table structure with some million entries:
CREATE TABLE `obj` (
`obj__id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`obj__obj_type__id` int(10) unsigned DEFAULT NULL,
`obj__title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`obj__const` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`obj__description` text COLLATE utf8_unicode_ci,
`obj__created` datetime DEFAULT NULL,
`obj__created_by` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`obj__updated` datetime DEFAULT NULL,
`obj__updated_by` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`obj__property` int(10) unsigned DEFAULT '0',
`obj__status` int(10) unsigned DEFAULT '1',
`obj__sysid` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`obj__scantime` datetime DEFAULT NULL,
`obj__imported` datetime DEFAULT NULL,
`obj__hostname` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`obj__undeletable` int(1) unsigned NOT NULL DEFAULT '0',
`obj__rt_cf__id` int(11) unsigned DEFAULT NULL,
`obj__cmdb_status__id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`obj__id`),
KEY `obj_FKIndex1` (`obj__obj_type__id`),
KEY `obj_ibfk_2` (`obj__cmdb_status__id`),
KEY `obj__sysid` (`obj__sysid`),
KEY `obj__title` (`obj__title`),
KEY `obj__const` (`obj__const`),
KEY `obj__hostname` (`obj__hostname`),
KEY `obj__status` (`obj__status`),
KEY `obj__updated_by` (`obj__updated_by`)
) ENGINE=InnoDB AUTO_INCREMENT=7640131 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
A very simple select with two conditions ordering by obj__title with a limit of 500 performs quiet slow (500ms):
SELECT SQL_NO_CACHE * FROM obj WHERE (obj__status = 2) AND (obj__obj_type__id = 59) ORDER BY obj__title ASC LIMIT 0, 500;
Without the "ORDER BY obj__title" it runs like a charm (<1ms).
EXPLAIN SELECT is telling me that MySQL is performing a filesort and not using the obj__title index. So, ok, it is quiet obvious that this query is slow:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE obj index_merge obj_FKIndex1,obj__status obj_FKIndex1,obj__status 5,5 NULL 1336 Using intersect(obj_FKIndex1,obj__status); Using where; Using filesort
When i am forcing the index obj__title to use with FORCE or USE INDEX, mysql is not using the other indexes resulting in a very poor performance again. But nevermind, it is quiet obvious that the poor performance has something to do with the combination of the two conditions and the order by.
Now that i spend hours on investigating on optimizing this query i came up with a very simple exchange: I exchanged the operator of my conditions from = to LIKE. So my query is like:
EXPLAIN SELECT SQL_NO_CACHE * FROM obj WHERE (obj__status LIKE 2) AND (obj__obj_type__id LIKE 59) ORDER BY obj__title ASC LIMIT 0, 500;
This is what happened..
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE obj index obj_FKIndex1,obj__status obj__title 768 NULL 500 Using where
Query performance is 150ms. I was shocked actually.
I am not really happy with the speed but at least it is performing ok.
But what I would really like to know is why LIKE is using the index, and = does not? I did not found any hints on that on the MySQL documentation. Only a few notes about LIKE being case insensitive and LIKE acting a bit different for VARCHARS > 255, or any other CHAR or TEXT fields.. No single word about it's integer behaviour.
Can someone shed light on this situation? Any Database design or query tips to speed up the query more are very welcome as well!

For this query:
SELECT SQL_NO_CACHE *
FROM obj
WHERE (obj__status = 2) AND (obj__obj_type__id = 59)
ORDER BY obj__title ASC
LIMIT 0, 500;
The best index is obj(obj__status, obj__obj_type__id, obj__title).
Otherwise, I would expect an index on one of the two where fields.
However, when you use like, you are comparing numbers to strings. This generally prevents an index from being used. The only possible index is for the order by, which happens to work in your case.
But, the proper index should have better performance.

The ORDER BY has to satisfied before the LIMIT. If there are a bloatload of rows, and MySQL performs a sort operation ("Using filesort") shown in the Extra column, that can be expensive.
MySQL can also satisfy an ORDER BY obj__title without performing a sort operation, by making use of an index with a leading column of obj__title. And that's what you see happening when you change the predicates. EXPLAIN shows that the index on obj__title is being used, there's no sort operation. But MySQL has to inspect each row, to see if it satisfies the predicates or not.
The LIKE predicate is causing the column to be evaluated in a string context, rather than numeric. That is, MySQL has to perform an implicit conversion from integer to varchar. And that prevents MySQL from using the index to satisfy the predicates. MySQL is basically being forced to do the conversion for every row in the table, in order to evaluate the predicate.
For best performance of that first query:
SELECT SQL_NO_CACHE *
FROM obj
WHERE obj__status = 2
AND obj__obj_type__id = 59
ORDER BY obj__title ASC
LIMIT 0, 500
You'd want an index with leading columns:
.... ON obj (obj__status, obj__obj_type__id, obj__title)
Then, MySQL could satisfy both of the equality predicates and the order by making use of the single index.
Note that this makes the index on just the single column obj__status redundant. Any query making use of the index on obj__status could make use of the new index.

Your first select needs this composite index. (I take the liberty of removing the "obj_" which just clutters the SQL.)
INDEX(type_id, status, title)
MySQL rarely uses more than one index in a query; this 3-column index is suited for WHERE status=(const) AND type_id=(const) ORDER BY title. I see that it used "index intersect" to try to compensate for the lack of a suitable composite index, but only partially.
Perhaps the optimizer looked at LIKE and said "Punt! I give up on using numeric comparisons, so let's not use either index on type_id or status. Instead, let's see if we can avoid the filesort by using INDEX(title)". And it happened to be better.
There is another thing that makes that filesort especially costly. "Using temporary" and "Filesort" prefer to do everything in RAM via a MEMORY table. But several things can prevent that. One is fetching of a TEXT field, which you do (SELECT * which includes description TEXT). I doubt if the optimizer noticed that. But the timings seem to have.
For more tips on indexing, see my index cookbook. Meanwhile, use LIKE only on strings, not numeric values.

Related

MySQL - just adding ORDER BY an indexed field adds 5 minutes for just 52 records. Where to start?

EDIT 2: now that we have optimized the db and narrowed down in MySQL - Why is phpMyAdmin extremely slow with this query that is super fast in php/mysqli?
EDIT 1: there are two solutions that helped us. One on database level (configuration) and one on query level. I could of course only accept one as the best answer, but if you are having similar problems, look at both.
We have a database that has been running perfectly fine for years. However, right now, we have a problem that I don't understand. Is it a mysql/InnoDB configuration problem? And we currently have nobody for system maintenance (I am a programmer).
The tabel TitelDaggegevens is a few Gigs in size, about 12,000,000 records, so nothing extraordinary.
If we do:
SELECT *
FROM TitelDaggegevens
WHERE fondskosten IS NULL
AND (datum BETWEEN 20200401 AND 20200430)
it runs fine, within a few tenths of a second.
The result: 52 records.
Also if we add ORDER BY datum or if we order by any other non-indexed field: all is well, same speed.
However, if I add ORDER BY id (id being the primary key), suddenly the query takes 15 seconds for the same 52 records.
And when I ORDER BY another indexed field, the query-time increases tot 4-6 minutes. For ordering 52 records. On an indexed field.
I have no clue what is going on. EXPLAIN doesn't help me. I optimized/recreated the table, checked it, and restarted the server. All to no avail. I am absolutely no expert on configuring MySQL or InnoDB, so I have no clue where to start the search.
I am just hoping that maybe someone recognises this and can point me into the right direction.
SHOW TABLE STATUS WHERE Name = 'TitelDaggegevens'
Gives me:
I know this is a very vague problem, but I am not able to pin it down more specifically. I enabled the logging for slow queries but the table slow_log stays empty. I'm lost.
Thank you for any ideas where to look.
This might be a help to someone who knows something about it, but not really to me, phpmyadmins 'Advisor':
In the comments and a reaction were asked for EXPLAIN outputs:
1) Without ORDER BY and with ORDER BY datum (which is in the WHERE and has an index):
2) With ORDER BY plus any field other than datum (indexed or not, so the same for both quick and slow queries).
The table structure:
CREATE TABLE `TitelDaggegevens` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`isbn` decimal(13,0) NOT NULL,
`datum` date NOT NULL,
`volgendeDatum` date DEFAULT NULL,
`prijs` decimal(8,2) DEFAULT NULL,
`prijsExclLaag` decimal(8,2) DEFAULT NULL,
`prijsExclHoog` decimal(8,2) DEFAULT NULL,
`stadiumDienstverlening` char(2) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`stadiumLevenscyclus` char(1) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`gewicht` double(7,3) DEFAULT NULL,
`volume` double(7,3) DEFAULT NULL,
`24uurs` tinyint(1) DEFAULT NULL,
`UitgeverCode` varchar(4) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`imprintId` int(11) DEFAULT NULL,
`distributievormId` tinyint(4) DEFAULT NULL,
`boeksoort` char(1) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`publishingStatus` tinyint(4) DEFAULT NULL,
`productAvailability` tinyint(4) DEFAULT NULL,
`voorraadAlles` mediumint(8) unsigned DEFAULT NULL,
`voorraadBeschikbaar` mediumint(8) unsigned DEFAULT NULL,
`voorraadGeblokkeerdEigenaar` smallint(5) unsigned DEFAULT NULL,
`voorraadGeblokkeerdCB` smallint(5) unsigned DEFAULT NULL,
`voorraadGereserveerd` smallint(5) unsigned DEFAULT NULL,
`fondskosten` enum('depot leverbaar','depot onleverbaar','POD','BOV','eBoek','geen') COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ISBN+datum` (`isbn`,`datum`) USING BTREE,
KEY `UitgeverCode` (`UitgeverCode`),
KEY `Imprint` (`imprintId`),
KEY `VolgendeDatum` (`volgendeDatum`),
KEY `Index op voorraad om maxima snel te vinden` (`isbn`,`voorraadAlles`) USING BTREE,
KEY `fondskosten` (`fondskosten`),
KEY `Datum+isbn+fondskosten` (`datum`,`isbn`,`fondskosten`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=16519430 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci
Have this to handle the WHERE entirely:
INDEX(fondskosten, Datum)
Note: the = is first, then the range.
Fetch the *. Note: If there are big TEXT or BLOB columns that you don't need, spell out the SELECT list so you can avoid them. They may be stored "off-record", hence take longer to fetch.
An optional ORDER BY. If it is on Datum, then there is no extra effort. If it is on any other column, then there will be a sort. But a sort of 52 rows will be quite fast (milliseconds).
Notes:
If you don't have fondskosten IS NULL or you have some other test, then all bets are off. We have to start over in designing the optimal composite index.
USE/FORCE INDEX -- use this as a last resort.
Always provide SHOW CREATE TABLE when needing to discuss a query.
The Advisor has some good stuff, but without any clues of what is "too big", it is rather useless.
I suspect all the other discussions failed to realize that there are far more than 52 rows for the given Datum range. That is fondskosten IS NULL is really part of the problem and solution.
For people searching for tweaks in similar cases, these are the tweaks the specialist made to the db that sped it up considerably (mind you this is for a database with 100s of tables and MANY very complex and large queries sometimes joining over 15 tables but not super massive number of records. The database is only 37 gigabytes.
[mysqld]
innodb_buffer_pool_size=2G
innodb_buffer_pool_instances=4
innodb_flush_log_at_trx_commit=2
tmp_table_size=64M
max_heap_table_size=64M
join_buffer_size=4M
sort_buffer_size=8M
optimizer_search_depth=5
The optimizer_search_depth was DECREASED to minimize the time the optimizer needs for the complex queries.
After restarting the server, (regularly) run all queries that are the result of running this query:
SELECT CONCAT('OPTIMIZE TABLE `', TABLE_SCHEMA , '`.`', TABLE_NAME ,'`;') AS query
FROM INFORMATION_SCHEMA.TABLES
WHERE DATA_FREE/DATA_LENGTH > 2 AND DATA_LENGTH > 4*1024*1024
(This first one better when the server is off line or has low use if you have large tables. It rebuilds and thus optimizes the tables that need it.)
And then:
SELECT CONCAT('ANALYZE TABLE `', TABLE_SCHEMA , '`.`', TABLE_NAME ,'`;') AS query
FROM INFORMATION_SCHEMA.TABLES
WHERE DATA_FREE/DATA_LENGTH > 2 AND DATA_LENGTH > 1*1024*1024
(This second querie-series is much lighter and less infringing but may still help speed up some queries by recalculating query strategies by the server.)
Looks like ORDER BY uses 3 different optimization plans
ORDER BY id - Extra: Using index condition; Using where; Using filesort. MySQL uses filesort to resolve the ORDER BY. But rows are sorted already. So, it takes 15 second.
ORDER BY Datum or other non-indexed field - Extra: Using index condition; Using where. MySQL uses Datum index to resolve the ORDER BY. It takes few seconds.
ORDER BY index_field - Extra: Using index condition; Using where; Using filesort. MySQL uses filesort to resolve the ORDER BY. Rows are unsorted. It takes few minutes.
It's my suggestion. Only EXPLAIN can tells what's going on
Influencing ORDER BY Optimization
UPD:
Could you check this query with every ORDER BY clauses?
SELECT *
FROM TitelDaggegevens USE INDEX FOR ORDER BY (Datum)
WHERE fondskosten IS NULL
AND (Datum BETWEEN 20200401 AND 20200430)
Also you may try to increasing the sort_buffer_size
If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing.
On Linux, there are thresholds of 256KB and 2MB where larger values may significantly slow down memory allocation, so you should consider staying below one of those values.

Very slow when order by id, but fast when order by timestamp, id

I encountered a very puzzling optimization case. I'm no SQL expert but still this case seems to defy my understanding of clustered key principles.
I have the below table schema:
CREATE TABLE `orders` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`chargeQuote` tinyint(1) NOT NULL,
`features` int(11) NOT NULL,
`sequenceIndex` int(11) NOT NULL,
`createdAt` bigint(20) NOT NULL,
`previousSeqId` bigint(20) NOT NULL,
`refOrderId` bigint(20) NOT NULL,
`refSeqId` bigint(20) NOT NULL,
`seqId` bigint(20) NOT NULL,
`updatedAt` bigint(20) NOT NULL,
`userId` bigint(20) NOT NULL,
`version` bigint(20) NOT NULL,
`amount` decimal(36,18) NOT NULL,
`fee` decimal(36,18) NOT NULL,
`filledAmount` decimal(36,18) NOT NULL,
`makerFeeRate` decimal(36,18) NOT NULL,
`price` decimal(36,18) NOT NULL,
`takerFeeRate` decimal(36,18) NOT NULL,
`triggerOn` decimal(36,18) NOT NULL,
`source` varchar(32) NOT NULL,
`status` varchar(50) NOT NULL,
`symbol` varchar(32) NOT NULL,
`type` varchar(50) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_STATUS` (`status`) USING BTREE,
KEY `IDX_USERID_SYMBOL_STATUS_TYPE` (`userId`,`symbol`,`status`,`type`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=7937243 DEFAULT CHARSET=utf8mb4;
This is a big table. 100 million rows. It's already sharded by createdAt, so 100 million = 1 month worth of orders.
I have a below slow query. The query is pretty straight-forward:
select id,chargeQuote,features,sequenceIndex,createdAt,previousSeqId,refOrderId,refSeqId,seqId,updatedAt,userId,version,amount,fee,filledAmount,makerFeeRate,price,takerFeeRate,triggerOn,source,`status`,symbol,type
from orders where 1=1
and userId=100000
and createdAt >= '1567775174000' and createdAt <= '1567947974000'
and symbol in ( 'BTC_USDT' )
and status in ( 'FULLY_FILLED' , 'PARTIAL_CANCELLED' , 'FULLY_CANCELLED' )
and type in ( 'BUY_LIMIT' , 'BUY_MARKET' , 'SELL_LIMIT' , 'SELL_MARKET' )
order by id desc limit 0,20;
This query takes 24 seconds. The number of rows that satisfy userId=100000 is very little, around 100. And the number of rows that satisfy this entire where clause is 0.
But when I did a small tweak, that is, I changed the order by clause:
order by id desc limit 0,20; -- before
order by createdAt desc, id desc limit 0,20; -- after
It became very fast, 0.03 seconds.
I can see it made a big difference in MySQL engine because explain gives that, before the change it was using key: PRIMARY and after it finally uses key: IDX_USERID_SYMBOL_STATUS_TYPE, as expected, and I guess therefore very fast. Here's the explain plan:
select_type table partitions type possible_keys key key_len ref rows filtered Extra
SIMPLE orders index IDX_STATUS,IDX_USERID_SYMBOL_STATUS_TYPE PRIMARY 8 20360 0.02 Using where
SIMPLE orders range IDX_STATUS,IDX_USERID_SYMBOL_STATUS_TYPE IDX_USERID_SYMBOL_STATUS_TYPE 542 26220 11.11 Using index condition; Using where; Using filesort
So what gives? Actually I was very surprised by the fact that it was not naturally sorted by id (which is the PRIMARY KEY). Isn't this the clustered key in MySQL? And why it chose to not to use index when it's sorted by id?
I'm very puzzled because a more demanding query (sort by 2 conditions) is super fast but a more lenient query is slow.
And no, I tried ANALYZE TABLE orders; and nothing happened.
MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:
Read all qualifying rows, sort them, and pick the n top rows.
Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, the MySQL optimizer evidently thinks that the second strategy is the best. Inn other words, it does not see that the WHERE clause will not be satisfied by any rows, but thinks that 2% of the rows will satisfy the WHERE clause, and that it will be able to find 20 rows by only scanning part of the table backwards in PRIMARY key order.
How the filtering effect of a WHERE clause is estimated varies quite a bit between 5.6, 5.7, and 8.0. If you are using MySQL 8.0, you can try to create histograms for the columns involved to see if that can improve the estimation. If not, I think your only option is to use a FORCE INDEX hint to make the optimizer choose the desired index.
For your fast query, the second strategy is not an option since there is no index on createdAt that can be used to avoid sorting.
Update:
Reading Rick's answer, I realized that an index on only userId should speed up your ORDER BY id query. In such an index, the entries for a given userId will be sorted on primary key. Hence, using this index will both make it possible to only access the rows of the requested userId, and access the rows in the requested sort order (by id).
The main filters works well with cardinality estimator. When order by uses limit, this is automatically another filter, as data needs to be filter further. This may redirect cardinality estimator to prone to inaccurate estimation which eventually result a poor plan to be selected. In order to prove this, run the 24sec query without the limit clause. It should also respond at 0.3 as your trick.
In order to solve this, if you have a standard very good performance just with the main filters, select this first, and filter at later 2nd time where the result set will be significantly smaller than the whole table. Use something like:
select * from (select ...main select statement)
order by x limit by y
...or...
insert into temp select ...main select statement
select from temp order by x limit by y
Given
and userId=100000
and createdAt >= '1567775174000' and createdAt <= '1567947974000'
and ... -- I am not making use of the other items
order by createdAt DESC, id desc -- I am assuming this change
limit 0,20;
I would try
INDEX(userId, createdAt, id) -- in this order
userId is tested by = is first, thereby narrows down the part of the index to look at.
Leave out the columns tested by IN. If there are multiple values in a IN, we can't make use of step 4.
createdAt filters further by range.
createdAt and id are compared in the same direction (DESC). (Yes, I know 8.0 has an improvement, but I don't think you wanted (ASC, DESC)).

Optimize Indexes for Particular Query in mySQL

I have a fairly simple query that is taking about 14 seconds to complete and I would like to speed it up. I think I have the correct indexes in place, but I'm not sure...
Here is the query
SELECT *
FROM opportunities
WHERE cid = 7785
AND STATUS != 4
AND otype != 200
AND links > 0
AND ontopic != 'F'
ORDER BY links DESC
LIMIT 0, 100;
Here is the table schema
CREATE TABLE `opportunities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cid` int(11) NOT NULL,
`url` varchar(900) CHARACTER SET utf8 NOT NULL,
`status` tinyint(4) NOT NULL,
`links` int(11) NOT NULL,
`otype` int(11) NOT NULL,
`reserved` tinyint(4) NOT NULL,
`ontopic` varchar(3) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `cid` (`cid`,`url`),
KEY `cid1` (`cid`),
KEY `url` (`url`),
KEY `otype` (`otype`),
KEY `reserved` (`reserved`),
KEY `ontopic` (`ontopic`),
KEY `status` (`status`),
KEY `links` (`links`),
KEY `ontopic_links` (`ontopic`,`links`),
KEY `cid_status_otype_links_ontopic` (`cid`,`status`,`otype`,`links`,`ontopic`)
) ENGINE=InnoDB AUTO_INCREMENT=13022832 DEFAULT CHARSET=latin1
Here is the result of the EXPLAIN command
id: 1
select_type: Simple
table: opportunities
partitions: null
type: range
possible_keys: cid,cid1,otype,ontopic,status,links,ontopic_links,cid_status_otype_links_ontopic
key: links
keylen: 4
ref: null
rows: 1531552
filtered: 0.33
Extra: Using index condition; Using where
Thoughts / Questions
Am I reading it correctly that it is using the "links" key to do the query? Why wouldn't it use a more complete index, like the cid_status_otype_links_ontopic which covers all the conditions of my query?
Thanks in advance!
As requested
There are 30,961 results that match the query when you remove the LIMIT 0,100. Interestingly, the "count()" command returns almost instantaneously.
It's a funny thing about using inequality comparisons, that they count as range conditions.
That is, equality matches one value, but anything other than equality (!=, >, <, IN, BETWEEN).
By matching multiple values, it means that only the first column in an index used in a range condition is going to be optimized. You'd think that your index cid_status_otype_links_ontopic has all the columns mentioned in conditions of your query, but only the first two will be used. The first because you have an equality comparison for cid. The second because the next column is used in an inequality comparison, and then that's where it stops using columns from the index.*
Evidence: if you can force that index to be used, you should see the keylen field of the EXPLAIN result show only 5, which is the size of cid (4 bytes) + status (1 byte).
The MySQL optimizer apparently has predicted that it would be more beneficial to use your links index, because that allows it to access the rows in index order, which is the same as the sort order you requested with your ORDER BY.
Evidence: you don't see "Using filesort" in your EXPLAIN notes.
Is that really better than using one of the other indexes? Maybe, maybe not. The optimizer's predictions aren't always perfect.
You can use an index hint to override the optimizer's choice:
SELECT * FROM opportunities USE INDEX (cid_status_otype_links_ontopic) WHERE ...
Try that out, do the EXPLAIN of that query and compare it to your other EXPLAIN. Then execute both queries and see which is reliably faster.
(* Actually, I have to add a footnote about the index column usage. MySQL 5.6 and later can do a little bit better than just the two columns, when you see the note "Using Index Condition" in the EXPLAIN. But it's not quite the same. You can read more about that here: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html)
What you have must plow through all of the rows, using your 5-column index, then sort the results and deliver 100 rows.
The only index likely to be useful is INDEX(cid, links). This is because cid is the only column being tested with =, then having links might be useful for the ORDER BY and LIMIT. There is still the risk that the != tests will require filtering a lot of rows.
Are status and otype multi-valued? If either has only 2 values, then turning the != into = and adding it to the index would be beneficial.
Do you really need all the columns (SELECT *)? If not, and if you don't need any big columns (url), then you could go with a 'covering' index.
More on writing indexes .

MySQL Multiple column index

Ok, I have the following MySQL table structure:
CREATE TABLE `creditlog` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`memberId` int(10) unsigned NOT NULL,
`quantity` decimal(10,2) unsigned DEFAULT NULL,
`timeAdded` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`reference` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `memberId` (`memberId`),
KEY `timeAdded` (`timeAdded`));
And I'm querying it like this:
SELECT SUM(quantity) FROM creditlog where timeAdded>'2016-09-01' AND timeAdded<'2016-10-01' AND memberId IN (3,6,8,9,11)
Now, I also use the use index (timeAdded) because due to the number of entries it is more convenient. Explaining the above query shows:
type -> range,
key -> timeAdded,
rows -> 921294
extra -> using where
Meanwhile if I use the memberId INDEX it shows:
type -> range,
key -> memberId,
rows -> 1707849
extra -> using where
Now, my question is it's possible to combine these 2 indexes somehow to be used together and reduce the surface of the query since I ll also need to add more conditions (on other columns).
MySQL almost never uses two indexes in a single query; it is just not cost effective. However, composite indexes are often very efficient. You need this order: INDEX(memberId, timeAdded).
Build the index this way...
First include column(s) that are in the WHERE clause tested with =. (None, in your case.)
Any column(s) with IN.
One 'range', such as <, BETWEEN, etc.
Move onto all the fields of the GROUP BY or ORDER BY. (Not relevant here.)
There are a lot of exceptions and caveats. Some are given in my cookbook .
(Contrary to popular opinion, cardinality is almost never relevant in designing an index.)
Here is a way to compare two indexes (even with a table that is too small to get reliable timings):
FLUSH STATUS;
SELECT SQL_NO_CACHE ...;
SHOW SESSION STATUS LIKE 'Handler%';
(repeat for other query/index)
Smaller numbers almost always indicate better.
"timeAdded>'2016-09-01' AND timeAdded<'2016-10-01'" -- That excludes midnight on the first day. I recommend this pattern:
timeAdded >= '2016-09-01'
AND timeAdded < '2016-09-01' + INTERVAL 1 MONTH
That also avoids computing dates.
That smells like a common query? Have you considered building and maintaining Summary tables ? The equivalent query would probably run 10 times as fast.

MYSQL performance slow using filesort

I have a simple mysql query, but when I have a lot of records (currently 103,0000), the performance is really slow and it says it is using filesort, im not sure if this is why it is slow. Has anyone any suggestions to speed it up? or stop it using filesort?
MYSQL query :
SELECT *
FROM adverts
WHERE (price >= 0)
AND (status = 1)
AND (approved = 1)
ORDER BY date_updated DESC
LIMIT 19990, 10
The Explain results :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE adverts range price price 4 NULL 103854 Using where; Using filesort
Here is the adverts table and indexes:
CREATE TABLE `adverts` (
`advert_id` int(10) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL,
`type_id` tinyint(1) NOT NULL,
`breed_id` int(10) NOT NULL,
`advert_type` tinyint(1) NOT NULL,
`headline` varchar(50) NOT NULL,
`description` text NOT NULL,
`price` int(4) NOT NULL,
`postcode` varchar(7) NOT NULL,
`town` varchar(60) NOT NULL,
`county` varchar(60) NOT NULL,
`latitude` float NOT NULL,
`longitude` float NOT NULL,
`telephone1` varchar(15) NOT NULL,
`telephone2` varchar(15) NOT NULL,
`email` varchar(80) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`approved` tinyint(1) NOT NULL DEFAULT '0',
`date_created` datetime NOT NULL,
`date_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`expiry_date` datetime NOT NULL,
PRIMARY KEY (`advert_id`),
KEY `price` (`price`),
KEY `user` (`user_id`),
KEY `type_breed` (`type_id`,`breed_id`),
KEY `headline_keywords` (`headline`),
KEY `date_updated` (`date_updated`),
KEY `type_status_approved` (`advert_type`,`status`,`approved`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The problem is that MySQL only uses one index when executing the query. If you add a new index that uses the 3 fields in your WHERE clause, it will find the rows faster.
ALTER TABLE `adverts` ADD INDEX price_status_approved(`price`, `status`, `approved`);
According to the MySQL documentation ORDER BY Optimization:
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
The key used to fetch the rows is not the same as the one used in the ORDER BY.
This is what happens in your case.
As the output of EXPLAIN tells us, the optimizer uses the key price to find the rows. However, the ORDER BY is on the field date_updated which does not belong to the key price.
To find the rows faster AND sort the rows faster, you need to add an index that contains all the fields used in the WHERE and in the ORDER BY clauses:
ALTER TABLE `adverts` ADD INDEX status_approved_date_updated(`status`, `approved`, `date_updated`);
The field used for sorting must be in the last position in the index. It is useless to include price in the index, because the condition used in the query will return a range of values.
If EXPLAIN still shows that it is using filesort, you may try forcing MySQL to use an index you choose:
SELECT adverts.*
FROM adverts
FORCE INDEX(status_approved_date_updated)
WHERE price >= 0
AND adverts.status = 1
AND adverts.approved = 1
ORDER BY date_updated DESC
LIMIT 19990, 10
It is usually not necessary to force an index, because the MySQL optimizer most often does the correct choice. But sometimes it makes a bad choice, or not the best choice. You will need to run some tests to see if it improves performance or not.
Remove the ticks around the '0' - it currently may prevent using the index but I am not sure.
Nevertheless it is better style since price is int type and not a character column.
SELECT adverts .*
FROM adverts
WHERE (
price >= 0
)
AND (
adverts.status = 1
)
AND (
adverts.approved = 1
)
ORDER BY date_updated DESC
LIMIT 19990 , 10
MySQL does not make use of the key date_updated for the sorting but just uses the price key as it is used in the WHERE clause. You could try to to use index hints:
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Add something like
USE KEY FOR ORDER BY (date_updated)
I have two suggestions. First, remove the quotes around the zero in your where clause. That line should be:
price >= 0
Second, create this index:
CREATE INDEX `helper` ON `adverts`(`status`,`approved`,`price`,`date_created`);
This should allow MySQL to find the 10 rows specified by your LIMIT clause by using only the index. Filesort itself is not a bad thing... the number of rows that need to be processed is.
Your WHERE condition uses price, status, approved to select, and then date_updated is used to sort.
So you need a single index with those fields; I'd suggest indexing on approved, status, price and date_updated, in this order.
The general rule is placing WHERE equalities first, then ranges (more than, less or equal, between, etc), and sorting fields last. (Note that leaving one field out might make the index less usable, or even unusable, for this purpose).
CREATE INDEX advert_ndx ON adverts (approved, status, price, date_updated);
This way, access to the table data is only needed after LIMIT has worked its magic, and you will slow-retrieve only a small number of records.
I'd also remove any unneeded indexes, which would speed up INSERTs and UPDATEs.