I'd like MySQL to use the index to sort these rows.
SELECT
identity_ID
FROM
identity
WHERE
identity_modified > 1257140905
ORDER BY
identity_modified
However, this is using a filesort for sorting (undesirable).
Now, if I leave off the ORDER BY clause here, the rows come out sorted simply as a consequence of using the index to satisfy the WHERE clause.
So, I can get the behaviour I want by leaving off the WHERE clause, but then I'm relying on MySQL's behaviour to be consistent for the rows to arrive in order, and might get stung in future simply if MySQL changes its internal behaviour.
What should I do? Any way of telling MySQL that since the index is stored in order (b-tree) that it doesn't need a filesort for this?
The table looks like this (simplified):
CREATE TABLE IF NOT EXISTS `identity` (
`identity_ID` int(11) NOT NULL auto_increment,
`identity_modified` int(11) NOT NULL,
PRIMARY KEY (`identity_ID`),
KEY `identity_modified` (`identity_modified`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
The solution I have so far is to leave off the ORDER BY clause, and add a FORCE INDEX (identity_modified) to the query. This should in theory ensure that the rows are returned in the order they are stored in the index.
It's probably not a best practice way to do it, but it seems to be the only way that works the way I want.
If you change the table type to INNODB there won't be a filesort
ALTER TABLE mydb.identity ENGINE = INNODB;
But I'm not sure there's a problem with your query:
http://forums.mysql.com/read.php?24,10738,10785#msg-10785
Run your original query with EXPLAIN before and after the operation.
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
Related
Dear StackOverflow Members
It's my first post, so please be nice :-)
I have a strange SQL behavior which i can't explain and don't find any resources which explains it.
I have built a web honeypot which record all access and attacks and display it on a statistic page.
However since the data increased, the generation of the statistic page is getting slower and slower.
I narrowed it down to a some select statements which takes a quite a long time.
The "issue" seems to be an index on a specific column.
*For sure the real issue is my lack of knowledge :-)
Database: mysql
DB schema
Event Table (removed unrelated columes):
Event table size: 30MB
Event table records: 335k
CREATE TABLE `event` (
`EventID` int(11) NOT NULL,
`EventTime` datetime NOT NULL DEFAULT current_timestamp(),
`WEBURL` varchar(50) COLLATE utf8_bin DEFAULT NULL,
`IP` varchar(15) COLLATE utf8_bin NOT NULL,
`AttackID` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
ALTER TABLE `event`
ADD PRIMARY KEY (`EventID`),
ADD KEY `AttackID` (`AttackID`);
ALTER TABLE `event`
ADD CONSTRAINT `event_ibfk_1` FOREIGN KEY (`AttackID`) REFERENCES `attack` (`AttackID`);
Attack Table
attack table size: 32KB
attack Table records: 11
CREATE TABLE attack (
`AttackID` int(4) NOT NULL,
`AttackName` varchar(30) COLLATE utf8_bin NOT NULL,
`AttackDescription` varchar(70) COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
ALTER TABLE `attack`
ADD PRIMARY KEY (`AttackID`),
SLOW Query:
SELECT Count(EventID), IP
-> FROM event
-> WHERE AttackID >0
-> GROUP BY IP
-> ORDER BY Count(EventID) DESC
-> LIMIT 5;
RESULT: 5 rows in set (1.220 sec)
(This seems quite long for me, for a simple query)
QuerySlow
Now the Strange thing:
If I remove the foreign key relationship the performance of the query is the same.
But if I remove the the index on event.AttackID same select statement is much faster:
(ALTER TABLE `event` DROP INDEX `AttackID`;)
The result of the SQL SELECT query:
5 rows in set (0.242 sec)
QueryFast
From my understanding indexes on columns which are used in "WHERE" should improve the performance.
Why does removing the index have such an impact on the query?
What can I do to keep the relations between the table and have a faster
SELECT execution?
Cheers
Why does removing the index improve performance?
The query optimizer has multiple ways to resolve a query. For instance, two methods for filtering data are:
Look up the rows that match the where clause in the index and then fetch related data from the data pages.
Scan the index.
This doesn't get into the use of indexes for joins or aggregations or alternative algorithms.
Which is better? Under some circumstances, the first method is horribly slower than the second. This occurs when the data for the table does not fit into memory. Under such circumstances, the index can read a record from page 124 and then from 1068 and then from 124 again and -- well, all sorts of random intertwined reading of pages. Reading data pages in order is usually faster. And when the data doesn't fit into memory, thrashing occurs, which means that a page in memory is aged (overwritten) -- and then needed again.
I'm not saying that is occurring in your case. I am simply saying that what optimizers do is not always obvious. The optimizer has to make judgements based on the nature of the data -- and those judgements are not right 100% of the time. They are usually correct. But there are borderline cases. Sometimes, the issue is out-of-date statistics. Sometimes the issue is that what looks best to the optimizer is not best in practice.
Let me emphasize that optimizers usually do a very good job, and a better job than a person would do. Even if they occasionally come up with suboptimal plans, they are still quite useful.
Get rid of your redundant UNIQUE KEYs. A primary key is a unique key.
Use COUNT(*) rather than COUNT(IP) in your query. They mean the same thing because you declared IP to be NOT NULL.
Your query can be much faster if you stop saying WHERE AttackId>0. Because that column is a FK to the PK of your other table, those values should be nonzero anyway. But to get that speedup you'll need an index on event(IP) something like this.
CREATE INDEX IpDex ON event (IP)
But you're still summarizing a large table, and that will always take time.
It looks like you want to display some kind of leaderboard. You could add a top_ips table, and use an EVENT to populate it, using your query, every few minutes. Then you could display it to your users without incurring the cost of the query every time. This of course would display slightly stale data; only you know whether that's acceptable in your app.
Pro Tip. Read https://use-the-index-luke.com by Marcus Winand.
Essentially every part of your query, except for the FKey, conspires to make the query slow.
Your query is equivalent to
SELECT Count(*), IP
FROM event
WHERE AttackID >0
GROUP BY IP
ORDER BY Count(*) DESC
LIMIT 5;
Please use COUNT(*) unless you need to avoid NULL.
If AttackID is rarely >0, the optimal index is probably
ADD INDEX(AttackID, -- for filtering
IP) -- for covering
Else, the optimal index is probably
ADD INDEX(IP, -- to avoid sorting
AttackID) -- for covering
You could simply add both indexes and let the Optimizer decide. Meanwhile, get rid of these, if they exist:
DROP INDEX(AttackID)
DROP INDEX(IP)
because any uses of them are handled by the new indexes.
Furthermore, leaving the 1-column indexes around can confuse the Optimizer into using them instead of the covering index. (This seems to be a design flaw in at least some versions of MySQL/MariaDB.)
"Covering" means that the query can be performed entirely in the index's BTree. EXPLAIN will indicate it with "Using index". A "covering" index speeds up a query by 2x -- but there is a very wide variation on this prediction. ("Using index condition" is something different.)
More on index creation: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I have a partitioned table in MySQL that looks like this:
CREATE TABLE `table1` (
`id` bigint(19) NOT NULL AUTO_INCREMENT,
`field1` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`field2_id` int(11) NOT NULL,
`created_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`,`created_at`),
KEY `index1` (`field2_id`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=603221206 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/*!50100 PARTITION BY RANGE (to_days(created_at))
(PARTITION p_0 VALUES LESS THAN (730485) ENGINE = InnoDB,
..... lots more partitions .....
PARTITION p_20130117 VALUES LESS THAN (735250) ENGINE = InnoDB) */;
And this is a typical SELECT query on the table:
SELECT field1 from TABLE1 where field2_id = 12345 and id > 13314313;
Doing an explain on it, MySQL sometimes decides to use PRIMARY instead of index1. This seems to be pretty consistent when you do a first explain. However, after a few repeated explains, MySQL finally decides to use the index. The problem is, this table has millions of rows, and inserts and selects are hitting it on the order of several times per second. Choosing the wrong index was causing these SELECT queries to take up to ~40 seconds, instead of sub second times. Can't really schedule downtime, so I can't run an optimize on the table (because of the size, it would probably take a long time), and not sure it would help in this case anyway.
I fixed this by forcing the index, so it looks like this:
SELECT field1 from TABLE1 FORCE INDEX (index1) WHERE field2_id = 12345 and id > 13314313;
We're running this on MySQL 5.1.63, which we can't move away from at the moment.
My question is, why is MySQL choosing the wrong index? And is there something that can be done to fix it, besides forcing the index on all queries? Is partitioning confusing the InnoDB engine? I've worked a lot with MySQL, and have never seen this behavior before. The query is as simple as can be, and the index is also a perfect match. We have a lot of queries that are assuming the DB layer will do the right thing, and I don't want to go through all of them forcing to use the correct index.
Update 1:
This is the typical explain, without the FORCE INDEX clause. Once that's put in, the possible keys column only show the forced index.
id select_type table type possible_keys key key_len ref rows
1 SIMPLE table1 range PRIMARY,index1 index1 12 NULL 207
I'm not 100% sure, but i think this sounds logic:
You partition your table BY RANGE (to_days(created_at)). the created_at field is part of the primary_key. Your select-queries are using the other part of the primary-key. This way the server optimization engine thinks this would be the speediest index - using the partition and the id-primary-part.
i suggest (without knowing the real cause that lead to your choice) to change your partition-range to the id and change the order of your index1-key.
for more information on partitioning have a look
I'm not sure why the engine would pick the incorrect index. I would think that an index that has an EQUALITY test would supersede that of one with a >, < or range. However, another option that might help force the correct index would be to force a "computed" value on the other id column so the engine might not be able to do a direct correlation to the index... Something like
WHERE field2_id = 12345 and id > 13314313
changed to
WHERE field2_id = 12345 and id + 0 > 13314313
Having this table:
CREATE TABLE `example` (
`id` int(11) unsigned NOT NULL auto_increment,
`keywords` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
We would like to optimize the following query:
SELECT id FROM example WHERE keywords LIKE '%whatever%'
The table is InnoDB, (so no FULLTEXT for now) which would be the best index to use in order to optimize such query?
We've tried a simple :
ALTER TABLE `example` ADD INDEX `idxSearch` (`keywords`);
But an explain query shows that need to scan the whole table
if our queries where LIKE 'whatever%' instead, this index performs well, but otherwise has no value.
Is there anyway to optimize this for innoDB ?
Thanks!
Indexes are built from the start of the string towards the end. When you use LIKE 'whatever%' type clause, MySQL can use those start-based indexes to look for whatever very quickly.
But switching to LIKE '%whatever%' removes that anchor at the start of the string. Now the start-based indexes can't be used, because your search term is no longer anchored at the start of the string - it's "floating" somewhere in the middle and the entire field has to be search. Any LIKE '%... query can never use indexes.
That's why you use fulltext indexes if all you're doing are 'floating' searches, because they're designed for that type of usage.
Of major note: InnoDB now supports fulltext indexes as of version 5.6.4. So unless you can't upgrade to at least 5.6.4, there's nothing holding you back from using InnoDB *AND fulltext searches.
I would like to comment that surprisingly, creating an index also helped speed up queries for like '%abc%' queries in my case.
Running MySQL 5.5.50 on Ubuntu (leaving everything on default), I have created a table with a lot of columns and inserted 100,000 dummy entries. In one column, I inserted completely random strings with 32 characters (i.e. they are all unique).
I ran some queries and then added an index on this column.
A simple
select id, searchcolumn from table_x where searchcolumn like '%ABC%'
returns a result in ~2 seconds without the index and in 0.05 seconds with the index.
This does not fit the explanations above (and in many other posts). What could be the reason for that?
EDIT
I have checked the EXPLAIN output. The output says rows is 100,000, but Extra info is "Using where; Using index". So somehow, the DBMS has to search all rows, but still is able to utilise the index?
I am having some difficulties finding an answer to this question...
For simplicity lets create use this situation.
I create a table like this..
CREATE TABLE `test` (
`MerchID` int(10) DEFAULT NULL,
KEY `MerchID` (`MerchID`)
) ENGINE=InnoDB AUTO_INCREMENT=32769 DEFAULT CHARSET=utf8;
I will insert some data into the column of this table...
INSERT INTO test
SELECT 1
UNION
SELECT 2
UNION
SELECT null
Now I examine the query using MYSQL's explain feature...
EXPLAIN
SELECT * FROM test
WHERE merchid IS NOT NULL
Resting in ID=1
,select_type=SIMPLE
,table=test
,type=index
,possible_keys=MerchID
,key=MerchID
,key_len=5
,ref=NULL
,rows=3
,Extra= Using where
;Using index
In production in my real procedure something like this takes a long time with this index. If I re declare the table with the index line reading "KEY MerchID (MerchID) USING BTREE' I get much better results. The explain feature seems to return the same results too. I have read some basics about the BTREE, HASH and RTREE storage types for indexes/keys. When no storage type is specified I was unded the assumption that BTREE would be assumed. However I am kinda stumped why when modifying my index to use this storage type my procedure seems to fly. Any ideas?
I am using MYSQL 5.1 and coding in MYSQL Workbench. The part of procedure that appears to be help up is like the one I illustrated above where the column of a joined table is tested for NULL.
I think you are on the wrong path. For InnoDB storage the only available index method is the BTREE so if you are safe to omit the BTREE keyword from you table create script.Supported index types here along with other useful information.
The performance issue is coming from a different place.
Whenever testing performance, be sure to always use the SQL_NO_CACHE directive, otherwise, with query caching, the second time you run a query, your results may be returned a lot faster simply due to caching.
With a covering index (all of the selected and filtered columns are in the index), the query is rather efficient. Using index in the EXPLAIN result shows that it's being used as a covering index.
However, if the index were not a covering index, MySQL would have to perform a seek for each row returned by the index in order to grab the actual table data. While this would still be fast for a small result set, with a result set of 1 million rows, that would be 1 million seeks. If the number of NULL rows were a high percentage, MySQL would abandon the index altogether to avoid the seeks.
Ensure that your real "production" index is a covering index as well.
If I have the following table:
CREATE TABLE `mytable` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(64) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_first_letter` (`name`(1)),
KEY `name_all` (`name`)
)
Will MySQL ever choose to use the name_first_letter index over the name_all index? If so, under what conditions would this happen?
I have done some quick tests and I'm not sure if MySQL will choose the name_first_letter index even when using index hints:
-- This uses name_all
EXPLAIN SELECT name FROM mytable
WHERE SUBSTRING(name FROM 1 FOR 1) = 'T';
-- This uses no index at all
EXPLAIN SELECT name FROM mytable USE INDEX (name_first_letter)
WHERE SUBSTRING(name FROM 1 FOR 1) = 'T';
Can any MySQL gurus shed some light on this? Is there even a point to having name_first_letter on this column?
Edit: Question title wasn't quite right.
It will not make sense to use the index for your query, because you are selecting the full name column. That means that MySQL cannot use the index alone to satisfy the query.
Further, I believe that MySQL cannot understand that the SUBSTRING(name FROM 1 FOR 1) expression is equivalent to the index.
MySQL might, however, use the index if the index alone can satisfy the query. For example:
select count(*)
from mytable
where name like 'T%';
But even that depends on you statistics (hinting should work).
MySQLs partial index feature is intended to save space. It does (usually) not make sense to have both, the partial and the full column index. You would typically drop the shorter one. There might be a rare case where it makes sense, but doesn't make sense in general.