MySQL seems to ignore index on datetime column - mysql

I am having a pretty severe performance problem when comparing or manipulating two datetime columns in a MySQL table. I have a table tbl that's got around 10 million rows. It looks something like this:
`id` int(11) NOT NULL AUTO_INCREMENT,
`last_interaction` datetime NOT NULL DEFAULT '1970-01-01 00:00:00',
`last_maintenence` datetime NOT NULL DEFAULT '1970-01-01 00:00:00',
PRIMARY KEY (`id`),
KEY `last_maintenence_idx` (`last_maintenence`),
KEY `last_interaction_idx` (`last_interaction`),
ENGINE=InnoDB AUTO_INCREMENT=12389814 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When I run this query, it takes about a minute to execute:
SELECT id FROM tbl
WHERE last_maintenence < last_interaction
ORDER BY last_maintenence ASC LIMIT 200;
Describing the query renders this:
id: 1
select_type: SIMPLE
table: tbl
type: index
possible_keys: NULL
key: last_maintenence_idx
key_len: 5
ref: NULL
rows: 200
Extra: Using where
It looks like MySQL isn't finding/using the index on last_interaction. Does anybody know why that might be and how to trigger it?

That is because mysql normally uses only one index per table. Unfortunately your query though it looks a simple one. The solution would be to create a composite index involving both columns.

The query you provide needs a full table scan.
Each and every record has to be checked to match your criterion last_maintenence < last_interaction.
So, no index is used when searching.
The index last_maintenence_idx is used only because you want the results ordered by it.

Thats easy: Mistake No 1.
MySQL can only (mostly) use one index per query and the optimizer looks which one is better for using.
So, you must create a composite index with both fields.

Related

Optimize Indexes for Particular Query in mySQL

I have a fairly simple query that is taking about 14 seconds to complete and I would like to speed it up. I think I have the correct indexes in place, but I'm not sure...
Here is the query
SELECT *
FROM opportunities
WHERE cid = 7785
AND STATUS != 4
AND otype != 200
AND links > 0
AND ontopic != 'F'
ORDER BY links DESC
LIMIT 0, 100;
Here is the table schema
CREATE TABLE `opportunities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cid` int(11) NOT NULL,
`url` varchar(900) CHARACTER SET utf8 NOT NULL,
`status` tinyint(4) NOT NULL,
`links` int(11) NOT NULL,
`otype` int(11) NOT NULL,
`reserved` tinyint(4) NOT NULL,
`ontopic` varchar(3) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `cid` (`cid`,`url`),
KEY `cid1` (`cid`),
KEY `url` (`url`),
KEY `otype` (`otype`),
KEY `reserved` (`reserved`),
KEY `ontopic` (`ontopic`),
KEY `status` (`status`),
KEY `links` (`links`),
KEY `ontopic_links` (`ontopic`,`links`),
KEY `cid_status_otype_links_ontopic` (`cid`,`status`,`otype`,`links`,`ontopic`)
) ENGINE=InnoDB AUTO_INCREMENT=13022832 DEFAULT CHARSET=latin1
Here is the result of the EXPLAIN command
id: 1
select_type: Simple
table: opportunities
partitions: null
type: range
possible_keys: cid,cid1,otype,ontopic,status,links,ontopic_links,cid_status_otype_links_ontopic
key: links
keylen: 4
ref: null
rows: 1531552
filtered: 0.33
Extra: Using index condition; Using where
Thoughts / Questions
Am I reading it correctly that it is using the "links" key to do the query? Why wouldn't it use a more complete index, like the cid_status_otype_links_ontopic which covers all the conditions of my query?
Thanks in advance!
As requested
There are 30,961 results that match the query when you remove the LIMIT 0,100. Interestingly, the "count()" command returns almost instantaneously.
It's a funny thing about using inequality comparisons, that they count as range conditions.
That is, equality matches one value, but anything other than equality (!=, >, <, IN, BETWEEN).
By matching multiple values, it means that only the first column in an index used in a range condition is going to be optimized. You'd think that your index cid_status_otype_links_ontopic has all the columns mentioned in conditions of your query, but only the first two will be used. The first because you have an equality comparison for cid. The second because the next column is used in an inequality comparison, and then that's where it stops using columns from the index.*
Evidence: if you can force that index to be used, you should see the keylen field of the EXPLAIN result show only 5, which is the size of cid (4 bytes) + status (1 byte).
The MySQL optimizer apparently has predicted that it would be more beneficial to use your links index, because that allows it to access the rows in index order, which is the same as the sort order you requested with your ORDER BY.
Evidence: you don't see "Using filesort" in your EXPLAIN notes.
Is that really better than using one of the other indexes? Maybe, maybe not. The optimizer's predictions aren't always perfect.
You can use an index hint to override the optimizer's choice:
SELECT * FROM opportunities USE INDEX (cid_status_otype_links_ontopic) WHERE ...
Try that out, do the EXPLAIN of that query and compare it to your other EXPLAIN. Then execute both queries and see which is reliably faster.
(* Actually, I have to add a footnote about the index column usage. MySQL 5.6 and later can do a little bit better than just the two columns, when you see the note "Using Index Condition" in the EXPLAIN. But it's not quite the same. You can read more about that here: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html)
What you have must plow through all of the rows, using your 5-column index, then sort the results and deliver 100 rows.
The only index likely to be useful is INDEX(cid, links). This is because cid is the only column being tested with =, then having links might be useful for the ORDER BY and LIMIT. There is still the risk that the != tests will require filtering a lot of rows.
Are status and otype multi-valued? If either has only 2 values, then turning the != into = and adding it to the index would be beneficial.
Do you really need all the columns (SELECT *)? If not, and if you don't need any big columns (url), then you could go with a 'covering' index.
More on writing indexes .

Using a covering index to select records for a certain day

I would like to run these queries:
select url from weixin_kol_status where created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
and
select url from weixin_kol_status where userid in ('...') and created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
… using this table definition:
CREATE TABLE `weixin_kol_status` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`url` varchar(512) NOT NULL,
`created_at` datetime NOT NULL,
`title` varchar(512) NOT NULL DEFAULT '',
`text` text,
`attitudes_count` int(11) NOT NULL DEFAULT '0',
`readcount` int(11) NOT NULL DEFAULT '0',
`reposts_count` int(11) NOT NULL DEFAULT '0',
`comments_count` int(11) NOT NULL DEFAULT '0',
`userid` varchar(32) NOT NULL,
`screen_name` varchar(32) NOT NULL,
`type` tinyint(4) NOT NULL DEFAULT '0',
`ext_data` text,
`is_topline` tinyint(4) NOT NULL DEFAULT '0',
`is_business` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_url` (`url`(255)),
KEY `idx_userid` (`userid`),
KEY `idx_name` (`screen_name`),
KEY `idx_created_at` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=328727437 DEFAULT CHARSET=utf8 |
rows = 328727437;
The queries take several minutes. How can I optimize the queries? How can I use the covering index?
The execution plans are:
explain select id from weixin_kol_status where created_at>='2015-12-11 00:00:00' and created_at<='2015-12-11 23:59:59'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: weixin_kol_status
type: range
possible_keys: idx_created_at
key: idx_created_at
key_len: 5
ref: NULL
rows: 1433704
Extra: Using where; Using index
1 row in set (0.00 sec)
and
explain select id from weixin_kol_status where created_at='2015-12-11 00:00:00'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: weixin_kol_status
type: ref
possible_keys: idx_created_at
key: idx_created_at
key_len: 5
ref: const
rows: 1
Extra: Using index
1 row in set (0.00 sec)
but why the first query Extra: Using where; Using index, and the second query Extra: Using index. Did the first query not use the covering index?
How can I use the covering index?
Do you know what a covering index is? It's an index that contains all the columns that you need for your query. So for
select url from weixin_kol_status where created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
the minimum covering index would be something like
KEY `idx_created_url` (`created_at`, `url`)
And for
select url from weixin_kol_status where userid in ('...') and created_at>'2015-12-11 00:00:00' and created_at<'2015-12-11 23:59:59';
the minimum covering index might be
KEY `idx_created_user_url` (`created_at`, `userid`, `url`)
which would also cover the first query or
KEY `idx_user_created_url` (`userid`, `created_at`, `url`)
which wouldn't work for the first query but may better optimize the second.
You might have to write out url(512) instead of just url. VARCHAR columns don't index well. If you get an error about the indexed values being too wide, you may not be able to use a covering index with this query.
A covering index is helpful because it can answer everything from the index in memory without going to the table on disk. Since memory is faster than disk, it has the effect of speeding up the query. Of course, if your index is paged out, you will still have to load it from disk. So if you're memory bound, this may not help.
Note that a query will only use one index per table, so the separate indexes on each column won't cover either query. You need a compound index to cover all the needed columns at once.
As a side note, I think that your > and < should be >= and <= respectively. Probably won't make much difference, but you seem to be skipping two seconds a day.
Several issues
UNIQUE(url(255)) constrains the first 255 characters to be unique; this was probably not desired.
If you need to force uniqueness of a long string (url), add another column with MD5(url) and make that column UNIQUE. (Or something like that.)
There is a limit of 767 bytes per column in an index, so if you try to create INDEX(created_at, url), you get INDEX(created_at, url(255)), which is not covering, since not all of url is in the index.
Both EXPLAINs are useless for this discussion since they are not using the SELECTs you are asking about. In the first, it says Using index because you say SELECT id; the actual query is SELECT url. This makes a big difference in performance.
You have a very large table. I see no way for PARTITIONing to help with speed.
This is a better way to express a 1-day WHERE:
created_at >= '2015-12-11'
AND created_at < '2015-12-11' + INTERVAL 1 DAY
To speed it up
Here is an off-the-wall technique that should help both queries. Instead of
PRIMARY KEY (`id`),
KEY `idx_created_at` (`created_at`)
Do it this way:
PRIMARY KEY(created_at, id),
INDEX(id)
That will "cluster" on created_at, thereby cutting down on I/O significantly, especially for the first SELECT. Yes, it is OK for an AUTO_INCREMENT to be just INDEXed, not UNIQUE nor PRIMARY KEY.
Caution: To make the change will take hours and a lot of disk space.

Optimize index for multi field ordering with mixed direction

I'm trying to optimize a MySQL table for faster reads. The ratio of read to writes is about 100:1 so I'm disposed to sacrifice write performances with multi indexes.
Relevant fields for my table are the following and it contains about 200000 records
CREATE TABLE `publications` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
-- omitted fields
`publicaton_date` date NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT '0',
`position` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
-- these are just attempts, they are not production index
KEY `publication_date` (`publication_date`),
KEY `publication_date_2` (`publication_date`,`position`,`active`)
) ENGINE=MyISAM;`enter code here`
Since I'm using Ruby on Rails to access data in this table I've defined a default scope for this table which is
default_scope where(:active => true).order('publication_date DESC, position ASC')
i.e. every query in this table by default will be completed automatically with the following SQL fragment, so you can assume that almost all queries will have these conditions
WHERE `publications`.`active` = 1 ORDER BY publication_date DESC, position
So I'm mainly interested in optimize this kind of query, plus queries with publication_date in the WHERE condition.
I tried with the following indexes in various combinations (also with multiple of them at the same time)
`publication_date`
`publication_date`,`position`
`publication_date`,`position`,`active`
However a simple query as this one still doesn't use the index properly and uses filesort
SELECT `publications`.* FROM `publications`
WHERE `publications`.`active` = 1
AND (id NOT IN (35217,35216,35215,35218))
ORDER BY publication_date DESC, position
LIMIT 8 OFFSET 0
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: publications
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 34903
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Some considerations on my issue:
According to MySQL documentation a composite index can't be used for ordering when you mix ASC and DESC in ORDER BY clause
active is a boolean flag, so put it in a standalone index make no sense (it has just 2 possible values) but it's always used in WHERE clause so it should appear somewhere in an index to avoid Using where in Extra
position is an integer with few possible values and it's always used scoped to publication_date so I think it's useless to have it in a standalone index
Lot of queries uses publication_date in the where part so it can be useful to have it also in a standalone index, even if redundant and it's the first column of the composite index.
One problem is that your are mixing sort orders in the order by clause. You could invert your position (inverted_position = max_position - position) so that you may also invert the sort order on that column.
You can then create a compound index on [publication_date, inverted_position] and change your order by clause to publication_date DESC, inverted_position DESC.
The active column should most likely not be part of the index as it has a very low selectivity.

Adding index to optimize MySQL query

We have the following MySQL table with about 150 million rows:
CREATE TABLE `data` (
`datetime` datetime NOT NULL,
`value1` decimal(12,6) NOT NULL,
`value2` decimal(12,6) NOT NULL,
`value3` decimal(12,6) NOT NULL,
`value4` decimal(12,6) NOT NULL,
`value5` decimal(12,6) NOT NULL,
`symbol_id` int(11) NOT NULL,
PRIMARY KEY (`symbol_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The 150 million rows are evenly split between 9500 symbols, designated by symbol_id.
I am trying to run the following query on the table:
SELECT datetime FROM data WHERE symbol_id = 1234 AND datetime <= "2013-03-01 15:00:00" ORDER BY datetime DESC LIMIT 1
Running an EXPLAIN on the query returns:
id: 1
select_type: SIMPLE
table: data
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 12
ref: NULL
rows: 23856
Extra: Using where; Using index
The query takes about 300ms on average to run. What index could I add to make this faster?
Thanks!
As Gordon hint at, there's no index which will improve the performance of your query.
That's not to say there's nothing you can do to make it faster - tune your DBMS and OS I/O - you've not provided any info about how it is currently configured nor what it's running on nor what the usage patterns are like. If you've not started this process then running mysqltuner.pl against your installation would be a good start - but it's not always completely correct. Using a different engine may improve performance for this query - but it depends on everything else going on on your system.
You'll get big gains by sharding the index across multiple disks and/or using SSDs for the index storage. More memory nearly always helps.
Go get a good book on MySQL tuning, spend time reading it.
The performance of this query can possibly be improved with an index but first you must determine the cardinality of your columns.
SELECT COUNT(DISTINCT `datetime`) FROM `data`;
SELECT COUNT(DISTINCT `symbol_id`) FROM `data`;
Whichever returns the highest number of unique values has a higher cardinality and to have an optimal composite index, the columns must be in descending order of cardinality.
You currently have a composite primary key with columns in the following order.
PRIMARY KEY (`symbol_id`,`datetime`)
If symbol_id has a higher cardinality than datetime then the query cannot be optimized further. On the other hand, if datetime has a higher cardinality then you should add an index with datetime followed by symbol_id.
INDEX idx_datetime_symbol (`datetime`,`symbol_id`)

MySQL: Get the first row that has a field >= value - is there a way to speed it up?

There is a table:
CREATE TABLE `test` (
`thing` int(10) unsigned NOT NULL,
`price` decimal(10,2) unsigned NOT NULL,
PRIMARY KEY (`thing`,`price`),
KEY `thing` (`thing`),
KEY `price` (`price`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and some values there:
INSERT INTO test(thing,price)
VALUES
(1,5.00),
(2,7.50),
(3,8.70),
(4,9.00),
(5,9.50),
(6,9.75),
(7,10.00),
(8,10.50),
(9,10.75),
(10,11.00),
(11,11.25);
I want to get a MINIMAL price from this table, that is MORE than, say, 9.2 - that is (5,9.50) record. So, I do:
SELECT thing, price FROM test WHERE price > 9.2 ORDER BY price LIMIT 1 . It's EXPLAIN output says that MySQL goes through all 7 rows that are more than 9.2:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE test range price price 5 \N 7 Using where; Using index
Is there a way to speed this up somehow? So that MySQL would give me ONE record that is a little more than my condition?
Thanks in advance for your help!
How fast do you want it to be? How slow is this now? This could be a case of pre-mature optimization. Please understand that the number of rows in the explain plan is just an estimate.
But none the less, try this if it makes any difference (??):
SELECT thing, min(price) FROM test WHERE price > 9.2
Create an index on price. You probably don't want to have the primary key on (thing, price), because that would allow a thing to have more than one price. Just make thing the primary key. (I am assuming that you don't want a particular thing to have multiple price values.)