I have query:
EXPLAIN SELECT * FROM _mod_news USE INDEX ( ind1 ) WHERE show_lv =1 AND active =1 AND START <= NOW( )
AND ( END >= NOW( ) OR END = "0000-00-00 00:00:00" ) AND id <> "18041" AND category_id = "3" AND leta =1 ORDER BY sort_id ASC , DATE DESC LIMIT 7
result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE _mod_news ref ind1 ind1 2 const,const 11386 Using where; Using filesort
mysql is performing full table scan
ind1 =
ALTER TABLE `_mod_news` ADD INDEX ind1 ( `show_lv`, `active`, `start`, `end`, `id`, `category_id`, `leta`, `sort_id`, `date`);
I tested on following index, but nothing changes
ALTER TABLE `_mod_news` ADD INDEX ind1 ( `show_lv`, `active`, `start`, `end`, `id`, `category_id`, `leta`);
Question is: where i can learn how to create indexes on many where conditions? Or someone can explain how to tell to mysql to use and index and not to scan whole table.
Thanks.
I would suggest not forcing index. Mysql is a great at selecting the best possible index unless you have better understanding of the data you are querying.
You cannot use ORDER BY optimization because you are mixing the ASC and DESC in that part.
Therefore your only option is to create index such that:
constant values before range
integers before dates, dates before strings, smaller size vales before bigger size values
Creating a large index also adds an overhead to storage and insert-update time, so i would not add to index fields that are not eliminating a lot of rows (i.e 90% or rows have a value of 1 or i.e id<>"18041" but that most likely eliminates < 1% of rows).
If you want to learn more about optimizing: http://dev.mysql.com/doc/refman/5.0/en/select-optimization.html
Create multiple different indexes (on decent size of data you expect seeing in the table), see which one mysql chooses, benchmark them by forcing each one of them, then use your common sense to cut down on index space usage.
You can see from you EXPLAIN output that it is actually NOT performing a full table scan because in that case it would not display it using the index even when you are forcing it.
You can try with USE INDEX or FORCE INDEX
Related
I have indexes on products table:
PRIMARY
products_gender_id_foreign
products_subcategory_id_foreign
idx_products_cat_subcat_gender_used (multiple column index)
QUERY:
select `id`, `name`, `price`, `images`, `used`
from `products`
where `category_id` = '1' and
`subcategory_id` = '2' and
`gender_id` = '1' and
`used` = '0'
order by `created_at` desc
limit 24 offset 0
Question:
Why mysql uses index
products_subcategory_id_foreign
insted of
idx_products_cat_subcat_gender_used (multiple column index)
HERE IS EXPLAIN :
1 SIMPLE products NULL ref products_gender_id_foreign,products_subcategory_id... products_subcategory_id_foreign 5 const 2 2.50 Using index condition; Using where; Using filesort
As explained in the MySQL documentation, a index can be ignored in some circunstances. The ones that could apply in your case, as one index is already beeing used, are:
You are comparing indexed columns with constant values and MySQL has
calculated (based on the index tree) that the constants cover too
large a part of the table and that a table scan would be faster. See
Section 8.2.1.1, “WHERE Clause Optimization”.
You are using a key with low cardinality (many rows match the key
value) through another column. In this case, MySQL assumes that by
using the key it probably will do many key lookups and that a table
scan would be faster.
My guess is that the values of category_id are not sparse enough
As I say here, this
where `category_id` = '1' and
`subcategory_id` = '2' and
`gender_id` = '1' and
`used` = '0'
order by `created_at` desc
limit 24 offset 0
needs a 5-column composite index:
INDEX(category_id, subcategory_id, gender_id, used, -- in any order
created_at)
to get to the LIMIT, thereby not having to fetch lots of rows and sort them.
As for your actual question about which index it picked... Probably the cardinality of one inadequate index was better than the other.
I have created the table measurements as listed below.
This table is written to periodically and will rapidly grow to contain millions of rows after a few days.
On read: I only need the precise time of the measurement and its value (unix_epoch and value).
To improve performance, I have added column date_from_epoch which is the day extracted out of unix_epoch (the measurement precise time) in this format: yyyymmdd. It should have a good selectivity (after multiple days of measurements have been written to the table) and I am using it as a key for an index. I am hoping to scan only the days for which I want the measurements on read, and not all the days present in the table (example: after 10 days, if 1,000,000 are added each day, I am hoping to scan only 1,000,000 rows if I need data contained within one day, not 10,000,000).
I have also:
used innoDB for the engine
partitioned the table by hash into 10 files to help with I/O
made sure the type used in my query is the same as the column type (or did I get this verification wrong?).
Question:
I have made a test after measurements have trickled in the measurement table for 2 days.
Using EXPLAIN, I see my read query does not use the index. Why is the query not using the index?
Table is created with:
CREATE TABLE measurements(
date_from_epoch INT UNSIGNED,
unix_epoch INT UNSIGNED,
application_name varchar(255),
environment varchar(255),
metric_name varchar(255),
host_name varchar(1024),
value FLOAT(38,3)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY HASH(unix_epoch)
PARTITIONS 10;
CREATE TRIGGER write_epoch_day
BEFORE INSERT ON measurements
FOR EACH ROW
SET NEW.date_from_epoch = FROM_UNIXTIME(NEW.unix_epoch, '%Y%m%d');
ALTER TABLE measurements ADD INDEX (date_from_epoch);
The query is:
EXPLAIN SELECT * FROM measurements
WHERE date_from_epoch >= 20150615 AND date_from_epoch <= 20150615
AND unix_epoch >= 1434423478 AND unix_epoch <= 1434430678
AND BINARY application_name = 'all'
AND BINARY environment = 'prod'
AND BINARY metric_name = 'Internet availability'
AND (BINARY host_name = 'kitkat' )
ORDER BY unix_epoch ASC;
Explain gives:
id select_type table type possible_keys key key_len ref rows Extra
-------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE measurements ALL date_from_epoch 118011 Using where; Using filesort
Thanks for reading and head-scratching!
There is an option to use FORCE INDEX in MYSQL
Refer this for better understanding.
Thanks Sashi!
I have modified the query to
EXPLAIN SELECT * FROM measurements FORCE INDEX (date_from_epoch)
WHERE date_from_epoch >= 20150615 AND date_from_epoch <= 20150615
AND unix_epoch >= 1434423478 AND unix_epoch <= 1434430678
AND BINARY application_name = 'all'
AND BINARY environment = 'prod'
AND BINARY metric_name = 'Internet availability'
AND BINARY host_name = 'kitkat'
ORDER BY unix_epoch ASC;
Explain says still "Using where; Using file sort" but the number of rows scanned is now down to 67,906 vs the 118,011 initially scanned (Which is great).
Although the number of rows for date_from_epoch = 20150615 is 113,182. I am now wondering why the number of rows scanned is not 113,182 (not that I want it to go up, but I would like to understand what mysql did to even further optimize the execution).
A lot of things need fixing:
Don't use PARTITION BY HASH; it does not help.
Since you have a range across the partition key, it must touch all partitions. See EXPLAIN PARTITIONS SELECT ....
Don't bother with the extra epoch_from_date and Trigger; just do comparisons on unix_epoch. (See the manual on the conversion routines needed.)
Don't use BINARY. Instead, specify the columns as COLLATION utf8_bin. Performance will be much better.
Normalize (or turn into an ENUM) these fields: application_name, environment, metric_name, host_name. What you have is unnecessarily bulky for millions of rows. (I am assuming there are only a few distinct values for those fields.) The space savings will make the SELECT run much faster.
FLOAT(38, 3) has an extra (unnecessary?) rounding. Simply use FLOAT.
(After making the above changes) INDEX(application_name, environment, metric_name, host_name, unix_epoch) would be quite helpful, at least for that one query. And it will be significantly better than the INDEX you are asking about.
I have a MySQL InnoDB table with two INT columns, say col1 and col2. I'd like to add an index that will allow me to:
SELECT * from myTable WHERE col0=5 ORDER BY col1*col2 DESC
Is it possible to have an index that will support such a sorting or will i need to add a column that keeps that value (col1*col2) ?
Noam, see ORDER BY Optimization. If you want to use the index for sorting, it should be the same as the index, that is used in the WHERE clause and of course the value for sorting needs to be stored in it's own column. Here I generated a test table with 100k rows, that should match your situation.
1.) Adding ONE INDEX on two columns (this works for utlizing an index for both select and sort):
ALTER TABLE `test_data` ADD INDEX super_sort (`col0`,`sort_col`);
EXPLAIN SELECT * FROM `test_data` WHERE col0 = 50 ORDER BY sort_col;
key -> super_sort; Extra -> using where
(index is used for WHERE and SORT)
2.) Adding two indexes, one for WHERE and one for SORT (won't work)
ALTER TABLE `test_data` DROP INDEX `super_sort`;
ALTER TABLE `test_data` ADD INDEX (`col0`);
ALTER TABLE `test_data` ADD INDEX (`sort_col`);
EXPLAIN SELECT * FROM `test_data` WHERE col0 = 50 ORDER BY sort_col;
key -> col0; Extra -> Using where; Using filesort
(an index is used for WHERE, BUT NOT for sorting)
So the answer is: Yes, you will need a column, that keeps that value (col1*col2) AND you need ONE index on both columns: col0 (for the WHERE-clause) + sort_col (for sorting) like in first example. As soon, as you ORDER BY any calculation (e.g. col1*col2) no index can be used for sorting.
You can add new column that contains the value of col1*col2 and use it for sorting. Otherwise you can use SELECT * from myTable WHERE col0=5 ORDER BY col1*col2 DESC.
I have a simple key-value table with two fields, created like so:
CREATE TABLE `mytable` (
`key` varchar(255) NOT NULL,
`value` double NOT NULL,
KEY `MYKEY` (`key`)
);
The keys are not unique. The table contains over one million records. I need a query that will sum up all the values for a given key, and return the top 10 keys. Here's my attempt:
SELECT t.key, SUM(t.value) value
FROM mytable t
GROUP BY t.key
ORDER BY value DESC
LIMIT 0, 10;
But this is very slow. Thing is, without the GROUP BY and SUM, it's pretty fast, and without the ORDER BY, it's very fast, but for some reason the combination of the two makes it very very slow. Can anyone explain why this is so, and how this can be speeded up?
There is no index on value. I tried creating one but it didn't help.
EXPLAIN EXTENDED produces the following in Workbench:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index NULL MYKEY 257 NULL 1340532 100.00 "Using temporary; Using filesort"
There are about 400K unique keys in the table.
The query takes over 3 minutes to run. I don't know how long because I stopped it after 3 minutes. However, if I remove the index on key, it runs in 30 seconds! Anyone has any idea why?
The only way to really speed this up, as far as I can see, is to create a seperate table with unique keys in and maintain the total values. Then you will be able to index values to retrieve the top ten quickly, also the calculation will already be done. As long as the table is not updated in too many places, this shouldn't be a major problem.
The major problem with this type of query is that the group by requires indexing in one order and the order by requires sorting into a different order.
I have a simple MyISAM table resembling the following (trimmed for readability -- in reality, there are more columns, all of which are constant width and some of which are nullable):
CREATE TABLE IF NOT EXISTS `history` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`time` int(11) NOT NULL,
`event` int(11) NOT NULL,
`source` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `event` (`event`),
KEY `time` (`time`),
);
Presently the table contains only about 6,000,000 rows (of which currently about 160,000 match the query below), but this is expected to increase. Given a particular event ID and grouped by source, I want to know how many events with that ID were logged during a particular interval of time. The answer to the query might be something along the lines of "Today, event X happened 120 times for source A, 105 times for source B, and 900 times for source C."
The query I concocted does perform this task, but it performs monstrously badly, taking well over a minute to execute when the timespan is set to "all time" and in excess of 30 seconds for as little as a week back:
SELECT COUNT(*) AS count FROM history
WHERE event=2000 AND time >= 0 AND time < 1310563644
GROUP BY source
ORDER BY count DESC
This is not for real-time use, so even if the query takes a second or two that would be fine, but several minutes is not. Explaining the query gives the following, which troubles me for obvious reasons:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE history ref event,time event 4 const 160399 Using where; Using temporary; Using filesort
I've experimented with various multi-column indexes (such as (event, time)), but with no improvement. This seems like such a common use case that I can't imagine there not being a reasonable solution, but my Googling all boil down to versions of the query I already have, with no particular suggestions on how to avoid the temporary (and even then, why performance is so abysmal).
Any suggestions?
You say you have tried multi-column indexes. Have you also tried single-column indexes, one per column?
UPDATE: Also, the COUNT(*) operation over a GROUP BY clause is probably a lot faster, if the grouped column also has an index on it... Of course, this depends on the number of NULL values that are actually in that column, which are not indexed.
For event, MySQL can execute a UNIQUE SCAN, which is quite fast, whereas for time, a RANGE SCAN will be applied, which is not so fast... If you separate indexes, I'd expect better performance than with multi-column ones.
Also, maybe you could gain something by partitioning your table by some expected values / value ranges:
http://dev.mysql.com/doc/refman/5.5/en/partitioning-overview.html
I offer you to try this multi-column index:
ALTER TABLE `history` ADD INDEX `history_index` (`event` ASC, `time` ASC, `source` ASC);
Then if it doesn't help, try to force index on this query:
SELECT COUNT(*) AS count FROM history USE INDEX (history_index)
WHERE event=2000 AND time >= 0 AND time < 1310563644
GROUP BY source
ORDER BY count DESC
If the source are known or you want to find the count for specific source, then you can try like this.
select count(source= 'A' or NULL) as A,count(source= 'B' or NULL) as B from history;
and for ordering you can do it in your application code. Also try with indexing event and source together.
This will be definitely faster than the older one.