MySQL seeking pagination on big composite primary key - mysql

Let's say I have a MySQL table defined like this:
CREATE TABLE big_table (
primary_1 varbinary(1536),
primary_2 varbinary(1536),
ts timestamp(6),
...
PRIMARY KEY (primary_1, primary_2),
KEY ts_idx (ts),
)
I would like to implement efficient pagination (seeking pagination) as described in this blog post https://use-the-index-luke.com/sql/partial-results/top-n-queries
If I only use the first part of the primary key, the pipelined execution works fast and as expected:
mysql> explain select * from big_table order by ts, primary_1 limit 5;
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
| 1 | SIMPLE | big_table | NULL | index | NULL | ts_idx | 7 | NULL | 5 | 100.00 | NULL |
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
However, if I add the second part of the primary key to the ORDER BY clause everything slows down and filesort starts being used:
mysql> explain select * from big_table order by ts, primary_1, primary_2 limit 5;
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| 1 | SIMPLE | big_table | NULL | ALL | NULL | NULL | NULL | NULL | 6388499 | 100.00 | Using filesort |
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
Is it not possible to do this pipelined execution and ordering on composite primary? Or should the query be written in some special way?

Without prior knowledge about how MySQL works internally, there is no reason to assume that an index on just ts can be used to order by ts, primary_1 without doing an additonal (file)sort on primary_1. Imagine e.g. the edge case that all values for ts are the same - the index will just give you all rows, which you then have to sort by primary_1.
Nevertheless, MySQL can make use of some additional information: InnoDB stores secondary indexes in a way that includes the primary key columns (to be able to find the actual row in the table). Since that information is there anyway, MySQL can just make use of it - and it does, by using Index Extensions. This basically extends the index ts to an index ts, primary_1, primary_2.
So this technical trick allows you to use the index on ts to order by ts, primary_1, primary_2. But since there is always a "but", here is the "but":
Use of index extensions by the optimizer is subject to the usual limits on the number of key parts in an index (16) and the maximum key length (3072 bytes).
The index on ts, primary_1, primary_2 would be longer than 3072 bytes. You can e.g. also not create such an index manually. So this extension doesn't work anymore, and MySQL falls back to treating the index on ts like an index on just ts.
So why does it work for order by ts, primary_1? Well, even if, for those technical reasons, MySQL cannot create an internal index on ts, primary_1, primary_2, it could at least do it for ts, primary_1 without running into technical problems. MySQL actually doesn't do that though - but the MariaDB developers implemented this trick, so I assume you are actually using MariaDB. Nevertheless, the length restriction of 3072 still applies, so your order by both primary columns still won't work.
What can you do?
If you can shorten your primary keys a bit, the index extension would work again. Primary keys that long (and of that type) are uncommon and unpractical anyway (not only for this use case), so maybe you can find a different primary key for your table.
If that is not an option, you may be able to utilize some prior knowledge about your data distribution, e.g. if you know that at most 10 values for ts can be the same, you can first pick the first n+10 rows (using the index), then order only those by the primary keys. If you usually only show the first few pages, this might speed up your specific situation. But you may want to ask a separate question for it with specific details.

Related

Is there a general rule for where in the order of a primary-key index to place the partition key?

Assume that I properly query the partition key in every query. Is there any sensible reason to place the partition key anywhere but first in line?
I feel like there's something I'm not understanding about how the index works. Assume MySQL and InnoDB.
I think I get that, ordinarily, you place the most selective keys first and the less selective ones later. And the partition key would ordinarily be one of the less selective ones. But if the partition key is included in every query, what difference does it make to include the partition key first? Wouldn't this help in other ways, too? E.g., I won't have to include the partition key in every index if it's up front in the primary-key index: queries using other indexes can borrow the primary key from the primary-key index consistent with the leftmost-key constraint.
And I don't know if an index itself is ever partitioned but it seems like it could be if it's a covering index. (Am I right?) If so, the partition key would have to be first, no, for the partitions to work?
E.g.:
CREATE TABLE `fee` (
`fi` INT ,
`fo` INT ,
PRIMARY KEY ( `fi` , `fo` ) ,
) ENGINE = INNODB
PARTITION BY RANGE ( `fi` ) (
. . .
);
Or . . .
CREATE TABLE `fee` (
`fi` INT ,
`fo` INT ,
PRIMARY KEY ( `fo` , `fi` ) ,
) ENGINE = INNODB
PARTITION BY RANGE ( `fi` ) (
. . .
);
Which, if either, is inherently better, and why or why not?
Thank you for your time.
The selectivity of the two columns doesn't matter as much as some people think.
If you were to query the table as:
SELECT ... FROM fee WHERE fi=? AND fo=?
Then what does it matter if it searches the B-tree by fi,fo or by fo,fi? It'll find the same record in the end, and it'll take roughly the same number of steps to do that. There's a theoretical difference, but in most cases it won't make a significant difference.
What's more important is if you have queries that only search for one or the other column of the primary key.
You mentioned that all queries search on the partition column, that's fi in this example. Do you have any queries that search on fi but not fo?
SELECT ... FROM fee WHERE fi=?
If fi were the first column of the primary key, this would do partition-pruning, and also use the PRIMARY KEY index because your search term is on the first column.
mysql> explain partitions select * from fee where fi = 175;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | fee | p2 | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
Whereas if fi were the second column of the primary key, then it could do partition-pruning, but not use the index.
mysql> explain partitions select * from fee where fi = 175;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | fee | p2 | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
Indexes are also partitioned. Think of partitioning as a series of completely separate tables, with the same columns and same indexes, just a subset of the rows. Once the query determines which partition to read, it does the query the same way it would against a non-partitioned table, choosing an index based on the query criteria. Will it use the primary key to search?
mysql> explain partitions select * from fee where fi = 175 and created_at < now();
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | fee | p2 | range | created_at | created_at | 6 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
Here we see the condition on fi resulted in partition pruning, and yet the index on created_at was preferred by the optimizer. It searches that index in the respective partition.
"you place the most selective keys first and the less selective ones later" -- No. That is an old wives tale.
Put keys that are tested with '=' first is a simple and more important rule.
Think of a composite InnoDB BTree index as working this way. Concatenate all the columns together, then picture the BTree as having a single string as the key.
Putting the "partition key" first in an index is the least useful place! You are already pruning on that; having it in the index is actually redundant. However, it is necessary for any Unique key (that includes the `PRIMARY KEY').
Yes, you correctly observed that the PK columns are implicitly included in every secondary key, hence the partition key is included.
Note that if the partition key is not really part of a desired UNIQUE key, then the uniqueness constraint is not possible (in MySQL). However, the tacked-on PK is not part of the uniqueness constraint. Since MySQL is only willing to check uniqueness for one partition, you must include the partition key to also provide the semantics that states "Unique" across the entire table. (Yeah, it is a bit convoluted; live with it.)
In your example, if you do SELECT .. WHERE fi BETWEEN 1 and 2 AND fo=3, any index (the PK is an index) starting with fi would work harder than if fo were first in the index.
So, a Rule of Thumb is to move the partition key to the end of any index that includes it. (I have seen only one rare exception; I forget the details.)

Optimize query?

My query took 28.39 seconds to run. How can I optimize it?
explain SELECT distinct UNIX_TIMESTAMP(timestamp)*1000 as timestamp,count(a.sig_name) as counter from event a,network n where n.fsi='pays' and n.net=inet_ntoa(a.ip_src) group by date(timestamp) order by timestamp asc;
+----+-------------+-------+--------+---------------+---------+---------+--- ---+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 8177074 | Using temporary; Using filesort |
| 1 | SIMPLE | n | eq_ref | PRIMARY,fsi | PRIMARY | 77 | func | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+------+---------+---------------------------------+
So generally looking at your query, we find that table event a is examining 8,177,074 rows. That is likely the "root" of the slowness, so we want to look at how to reduce the search space using indexes.
The main condition on event a is
n.net=inet_ntoa(a.ip_src)
The problem here is that we need to perform a calculation (inet_ntoa) on every row of a.ip_src, so there is no alternative but to scan the entire table. A potentially better solution would be to invert the comparison and ensure that a.ip_src is indexed.
a.ip_src=inet_aton(n.net)
This will only be better if we are matching less rows in n than we are in a. If that is not the case, you should seriously consider caching the result of this function in the table and creating an index on that.
Lastly I am guessing the timestamp column is in event a, in which case an index will potentially help with ordering and grouping though may not. You could try a multi_column index on (ip_src,timestamp)
Make it a practice to introduce at-least index on columns which can be used in WHERE/JOIN clauses. I've used the at-least because in many cases one should try to use PRIMARY/FOREIGN KEY relations. So if something is already a primary/foriegn key there is no need to index it further.
The above query can be simply improved by introducing the INDEX through the following query:
ALTER TABLE events ADD INDEX idx_ev_ipsrc (ip_src);
Here idx_ev_ipsrc = Name of the index key, and ip_src is the column to be indexed.
Even further enhancement:
Introduce multi-colum index on network table using following query:
ALTER TABLE network ADD INDEX idx_net_fsi_net (fsi,net);
The above will result in even low number of rows.
Note: The above queries are for MySql and can be tailored for other DBs easily.

MySQL join on table with partitions selecting all partitions

I have a photo gallery on my site with 1M photos in it. There are 2 search tables associated with it. Table #1 contains a list of words used in the photos. Table #2 contains a list of what words match up with what photos. Table #2 is 7M rows. I am testing partitioning this 7M row table because I have another set of tables with 120,000,000 rows. Queries against the 120M row wordmatch table below, with or without a join again the wordlist table below, take multiple seconds to run.
I am trying to perform a join between these 2 tables and MySQL 5.6 EXPLAIN PARTITIONS shows it is using all the partitions. How can I redo this query to make this correctly use only a single partition?
The 2 tables:
CREATE TABLE wordlist (
word_text varchar(50) NOT NULL DEFAULT '',
word_id mediumint(8) unsigned NOT NULL AUTO_INCREMENT
PRIMARY KEY (word_text),
KEY word_id (word_id)
) ENGINE=InnoDB
CREATE TABLE wordmatch (
pic_id int(11) unsigned NOT NULL DEFAULT '0',
word_id mediumint(8) unsigned NOT NULL DEFAULT '0',
title_match tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (word_id,pic_id,title_match),
KEY pic_id (pic_id)
) ENGINE=InnoDB
/*!50100 PARTITION BY HASH (word_id)
PARTITIONS 11 */;
SQL query I am performing:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon' AND m.word_id = w.word_id
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| 1 | SIMPLE | w | NULL | range | PRIMARY,word_id | PRIMARY | 52 | NULL | 1 | Using where |
| 1 | SIMPLE | m | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10 | ref | PRIMARY | PRIMARY | 3 | w.word_id | 34 | Using index |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
The join produces a query that uses all partitions.
If I retrieve the word_id # first and go straight against the wordmatch table, everything is ok:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordmatch m WHERE m.word_id = 219657;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | m | p9 | ref | PRIMARY | PRIMARY | 3 | const | 18220 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
How do I get this to work correctly?
I prefer not to split this into multiple queries if possible.
You may have noticed I am using LIKE above. People will often search on bacon% to get plurals of words, etc.
Example:
SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon%' AND m.word_id = w.word_id
I realize this wildcard search may result in 2 or more partitions being selected. This is probably ok, although if there is a way to change the partitioning to prevent that, I welcome any tips.
Edit #1: Added details as my original question was confusing. I was testing my 7M row table first before doing my 120M row table.
Edit #2: Resolution to my overall issue: My performance issues seem to be resolved as I partitioned my 120M row table into 101 partitions per this post: MySQL performance: partitions I do not know if MySQL is going against all the partitions at runtime - Ollie Jones says it does not in the comments below and EXPLAIN PARTITIONS is incorrect - but it is fast now so I am happy.
To get your query working with efficient indexing is probably a good idea before you dive into the partitioning project. Here's your query refactored to use JOIN:
SELECT m.pic_id
FROM wordlist w
JOIN wordmatch m ON w.word_id = m.word_id
WHERE w.word_text LIKE 'bacon%'
This query can use a compound index on wordlist (word_test, word_id). It will random-access the index for the first matching word_text, and then scan the index retrieving the word_id values until it gets to the last matching `word_text.
It can also use your existing primary key on wordmatch (word_id, pic_id) It speeds up your query because the data base engine can satisfy your query directly from the index without having to bat the hard drive back and forth to the table itself.
So, give those indexes a try. Your large table, the wordmatch table, should work fairly well without partitioning. It's more common to partition tables that contain lots of content (like the text of articles) than it is to partition this kind of fixed-row-size join table.
Notice that your EXPLAIN announces it will look at all the partitions because EXPLAIN can't tell which partition (or partitions) your w.word_text LIKE 'bacon%' WHERE-clause will need to examine. EXPLAIN isn't as dumb as a box of hammers, but it is close. MySQL won't examine the partitions it doesn't need to, but it doesn't know which partitions are involved until runtime.
Have you considered using FULLTEXT search? It might simplify what you're doing.
Your first query doesn't have any filtering conditions on wordmatch table that could limit the partitions in use, thus it needs to access all partitions. There is no way to redo this query to use only necessary partitions without adding a filter on the field that is the basis for the partitioning (word_id).
The second query filters on a specific word_id value, so the index knows exactly which partition to point to.
I would also agree with comment made by #OllieJones that I am not sure you should really worry about partitioning at only 7M rows. That is not really that big of a table in the grand schema of things.

Why doesn't MySQL use my partitions as indices?

I created a table partitioned on a numeric ID:
CREATE TABLE mytable (
...
`id` int(11) DEFAULT NULL
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY HASH (`id`) PARTITIONS 100
I have no primary key, but a number of indices. I don't have any data in my table where id is less than 0 or greater than 30 (at the moment, I expect this to grow). Most of my queries first include the id to reduce the search space.
I figured a query to select distinct(id) from mytable would then just return the number of partitions that had data in it. I was surprised that an explain on this instead does a full scan of the data:
explain partitions select distinct(id) from mytable;
| 1 | SIMPLE | mytable | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99 | ALL | NULL | NULL | NULL | NULL | 24667132 | Using temporary |
explain select distinct(id) from mytable;
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
| 1 | SIMPLE | mytable | ALL | NULL | NULL | NULL | NULL | 24667132 | Using temporary |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
I then read this stackoverflow answer which enlightened how MySQL's partition hash() function works.
My question is, how can I get MySQL to map each id in the table into its own partition such that selects with the id narrow the search to a single table (and a select distinct() just has to count the number of partitions and not scan them)?
I'm using Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu).
First off, your conflating two different things. One is the fact that a SELECT WHERE id = ? should only search one partition. Something which you mentioned but didn't specify whether it currently works or not (given your table definition, I don't see why it shouldn't).
The second thing, having a SELECT distinct(id) to only touch the partitioning information, is very different from this. However, if I understand you correctly, you're assuming that one partition only has one kind of id. That is not how HASH partitioning works, though. It works similar to a traditional hash-table, by mapping a large key space to a small one, in your case, 100. So each partition will have many possible IDs. Since mysql will not keep track which of the possible IDs are really in one partition all it can do is to scan each partition, do the DISTINCT, and give back the result. That said, it could to do the DISTINCT operation on the individual partitions instead of the whole table and it could do this in parallel, however, the explain seems to imply that it will create one big temporary to do the DISTINCT, likely because this optimization hasn't been implemented yet.

MySQL EXPLAIN 'type' changes from 'range' to 'ref' when the date in the where statement is changed?

I've been testing out different ideas for optimizing some of the tables we have in our system at work. Today I came across a table that tracks every view on each vehicle in our system. Create table below.
SHOW CREATE TABLE vehicle_view_tracking;
CREATE TABLE `vehicle_view_tracking` (
`vehicle_view_tracking_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`public_key` varchar(45) NOT NULL,
`vehicle_id` int(10) unsigned NOT NULL,
`landing_url` longtext NOT NULL,
`landing_port` int(11) NOT NULL,
`http_referrer` longtext,
`created_on` datetime NOT NULL,
`created_on_date` date NOT NULL,
`server_host` longtext,
`server_uri` longtext,
`referrer_host` longtext,
`referrer_uri` longtext,
PRIMARY KEY (`vehicle_view_tracking_id`),
KEY `vehicleViewTrackingKeyCreatedIndex` (`public_key`,`created_on_date`),
KEY `vehicleViewTrackingKeyIndex` (`public_key`)
) ENGINE=InnoDB AUTO_INCREMENT=363439 DEFAULT CHARSET=latin1;
I was playing around with multi-column and single column indexes. I ran the following query:
EXPLAIN EXTENDED SELECT dealership_vehicles.vehicle_make, dealership_vehicles.vehicle_model, vehicle_view_tracking.referrer_host, count(*) AS count
FROM vehicle_view_tracking
LEFT JOIN dealership_vehicles
ON dealership_vehicles.dealership_vehicle_id = vehicle_view_tracking.vehicle_id
WHERE vehicle_view_tracking.created_on_date >= '2011-09-07' AND vehicle_view_tracking.public_key IN ('ab12c3')
GROUP BY (dealership_vehicles.vehicle_make) ASC , dealership_vehicles.vehicle_model, referrer_host
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | vehicle_view_tracking | range | vehicleViewTrackingKeyCreatedIndex,vehicleViewTrackingKeyIndex | vehicleViewTrackingKeyCreatedIndex | 50 | NULL | 23086 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dealership_vehicles | eq_ref | PRIMARY | PRIMARY | 8 | vehicle_view_tracking.vehicle_id | 1 | 100.00 | |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
(Execution time for actual select query was .309 seconds)
then I change the date in the where clause from '2011-09-07' to '2011-07-07' and got the following explain results
EXPLAIN EXTENDED SELECT dealership_vehicles.vehicle_make, dealership_vehicles.vehicle_model, vehicle_view_tracking.referrer_host, count(*) AS count
FROM vehicle_view_tracking
LEFT JOIN dealership_vehicles
ON dealership_vehicles.dealership_vehicle_id = vehicle_view_tracking.vehicle_id
WHERE vehicle_view_tracking.created_on_date >= '2011-07-07' AND vehicle_view_tracking.public_key IN ('ab12c3')
GROUP BY (dealership_vehicles.vehicle_make) ASC , dealership_vehicles.vehicle_model, referrer_host
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | vehicle_view_tracking | ref | vehicleViewTrackingKeyCreatedIndex,vehicleViewTrackingKeyIndex | vehicleViewTrackingKeyIndex | 47 | const | 53676 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dealership_vehicles | eq_ref | PRIMARY | PRIMARY | 8 | vehicle_view_tracking.vehicle_id | 1 | 100.00 | |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
(Execution time for actual select query was .670 seconds)
I see 4 main changes:
type changed from range to ref
key changed from vehicleViewTrackingKeyCreatedIndex to vehicleViewTrackingKeyIndex
key_len changed from 50 to 47 (caused by the change in key)
rows changed from 23086 to 53676 (caused by the change in key)
At this point, the execution time is only .6 seconds for the slow query however we only have about 10% of our vehicles in our database.
It's getting late and I may have overlooked something in the mysql docs but I can't seem to find why the key (and in turn the type and rows) are changing when the date is changed in the where clause.
The help is greatly appreciated. I searched for someone having the same/similar issue with a date causing this change and was not able to find anything. If I missed a previous post, please link me :-)
Different search strategies make sense for different data. In particular, index scans (such as range) often have to do a seek to actually read the row. At some point, doing all those seeks is slower than not using the index at all.
Take a trivial example, a table with three columns: id (primary key), name (indexed), birthday. Say it has a lot of data. If you ask MySQL to look for Bob's birthday, it can do that fairly quickly: first, it finds Bob in the name index (this takes a few seeks, log(n) where n is the row count), then one additional seek to read the actual row in the data file and read the birthday from it. That's very quick, and far quicker than scanning the entire table.
Next, consider doing a name like 'Z%'. That is probably a fairly small portion of the table. So its still faster to find where the Zs start in the name index, then for each one seek the data file to read the row. (This is a range scan).
Finally, consider asking for all names starting with M-Z. That's probably around half the data. It could do a range scan, and then a lot of seeks, but seeking randomly over the datafile with the ultimate goal of reading half the rows isn't optimal: it'd be faster to just do a big sequential read over the data file. So, in this case, the index will be ignored.
This is what you're seeing—except in your case, there is another key it can fall back on. (Its also possible that it might actually use the date index if it didn't have the other, it should pick whichever index will be quickest. Beware that MySQL's optimizer often makes errors in this.)
So, in short, this is expected. A query doesn't say how to retrieve the data, rather it says what data to retrieve. The database's optimizer is supposed to find the quickest way to retrieve it.
You may find an index on both columns, in the order (public_key,created_on_date) is preferred in both cases, and speeds up your query. This is because MySQL can only ever use one index per table (per query). Also, the date goes at the end because a range scan can only be done efficiently on the last column in an index.
[InnoDB actually has another layer of indirection, I believe, but it'd just confuse the point. It doesn't make a difference to the explanation.]