I have a photo gallery on my site with 1M photos in it. There are 2 search tables associated with it. Table #1 contains a list of words used in the photos. Table #2 contains a list of what words match up with what photos. Table #2 is 7M rows. I am testing partitioning this 7M row table because I have another set of tables with 120,000,000 rows. Queries against the 120M row wordmatch table below, with or without a join again the wordlist table below, take multiple seconds to run.
I am trying to perform a join between these 2 tables and MySQL 5.6 EXPLAIN PARTITIONS shows it is using all the partitions. How can I redo this query to make this correctly use only a single partition?
The 2 tables:
CREATE TABLE wordlist (
word_text varchar(50) NOT NULL DEFAULT '',
word_id mediumint(8) unsigned NOT NULL AUTO_INCREMENT
PRIMARY KEY (word_text),
KEY word_id (word_id)
) ENGINE=InnoDB
CREATE TABLE wordmatch (
pic_id int(11) unsigned NOT NULL DEFAULT '0',
word_id mediumint(8) unsigned NOT NULL DEFAULT '0',
title_match tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (word_id,pic_id,title_match),
KEY pic_id (pic_id)
) ENGINE=InnoDB
/*!50100 PARTITION BY HASH (word_id)
PARTITIONS 11 */;
SQL query I am performing:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon' AND m.word_id = w.word_id
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| 1 | SIMPLE | w | NULL | range | PRIMARY,word_id | PRIMARY | 52 | NULL | 1 | Using where |
| 1 | SIMPLE | m | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10 | ref | PRIMARY | PRIMARY | 3 | w.word_id | 34 | Using index |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
The join produces a query that uses all partitions.
If I retrieve the word_id # first and go straight against the wordmatch table, everything is ok:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordmatch m WHERE m.word_id = 219657;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | m | p9 | ref | PRIMARY | PRIMARY | 3 | const | 18220 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
How do I get this to work correctly?
I prefer not to split this into multiple queries if possible.
You may have noticed I am using LIKE above. People will often search on bacon% to get plurals of words, etc.
Example:
SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon%' AND m.word_id = w.word_id
I realize this wildcard search may result in 2 or more partitions being selected. This is probably ok, although if there is a way to change the partitioning to prevent that, I welcome any tips.
Edit #1: Added details as my original question was confusing. I was testing my 7M row table first before doing my 120M row table.
Edit #2: Resolution to my overall issue: My performance issues seem to be resolved as I partitioned my 120M row table into 101 partitions per this post: MySQL performance: partitions I do not know if MySQL is going against all the partitions at runtime - Ollie Jones says it does not in the comments below and EXPLAIN PARTITIONS is incorrect - but it is fast now so I am happy.
To get your query working with efficient indexing is probably a good idea before you dive into the partitioning project. Here's your query refactored to use JOIN:
SELECT m.pic_id
FROM wordlist w
JOIN wordmatch m ON w.word_id = m.word_id
WHERE w.word_text LIKE 'bacon%'
This query can use a compound index on wordlist (word_test, word_id). It will random-access the index for the first matching word_text, and then scan the index retrieving the word_id values until it gets to the last matching `word_text.
It can also use your existing primary key on wordmatch (word_id, pic_id) It speeds up your query because the data base engine can satisfy your query directly from the index without having to bat the hard drive back and forth to the table itself.
So, give those indexes a try. Your large table, the wordmatch table, should work fairly well without partitioning. It's more common to partition tables that contain lots of content (like the text of articles) than it is to partition this kind of fixed-row-size join table.
Notice that your EXPLAIN announces it will look at all the partitions because EXPLAIN can't tell which partition (or partitions) your w.word_text LIKE 'bacon%' WHERE-clause will need to examine. EXPLAIN isn't as dumb as a box of hammers, but it is close. MySQL won't examine the partitions it doesn't need to, but it doesn't know which partitions are involved until runtime.
Have you considered using FULLTEXT search? It might simplify what you're doing.
Your first query doesn't have any filtering conditions on wordmatch table that could limit the partitions in use, thus it needs to access all partitions. There is no way to redo this query to use only necessary partitions without adding a filter on the field that is the basis for the partitioning (word_id).
The second query filters on a specific word_id value, so the index knows exactly which partition to point to.
I would also agree with comment made by #OllieJones that I am not sure you should really worry about partitioning at only 7M rows. That is not really that big of a table in the grand schema of things.
Related
Assume that I properly query the partition key in every query. Is there any sensible reason to place the partition key anywhere but first in line?
I feel like there's something I'm not understanding about how the index works. Assume MySQL and InnoDB.
I think I get that, ordinarily, you place the most selective keys first and the less selective ones later. And the partition key would ordinarily be one of the less selective ones. But if the partition key is included in every query, what difference does it make to include the partition key first? Wouldn't this help in other ways, too? E.g., I won't have to include the partition key in every index if it's up front in the primary-key index: queries using other indexes can borrow the primary key from the primary-key index consistent with the leftmost-key constraint.
And I don't know if an index itself is ever partitioned but it seems like it could be if it's a covering index. (Am I right?) If so, the partition key would have to be first, no, for the partitions to work?
E.g.:
CREATE TABLE `fee` (
`fi` INT ,
`fo` INT ,
PRIMARY KEY ( `fi` , `fo` ) ,
) ENGINE = INNODB
PARTITION BY RANGE ( `fi` ) (
. . .
);
Or . . .
CREATE TABLE `fee` (
`fi` INT ,
`fo` INT ,
PRIMARY KEY ( `fo` , `fi` ) ,
) ENGINE = INNODB
PARTITION BY RANGE ( `fi` ) (
. . .
);
Which, if either, is inherently better, and why or why not?
Thank you for your time.
The selectivity of the two columns doesn't matter as much as some people think.
If you were to query the table as:
SELECT ... FROM fee WHERE fi=? AND fo=?
Then what does it matter if it searches the B-tree by fi,fo or by fo,fi? It'll find the same record in the end, and it'll take roughly the same number of steps to do that. There's a theoretical difference, but in most cases it won't make a significant difference.
What's more important is if you have queries that only search for one or the other column of the primary key.
You mentioned that all queries search on the partition column, that's fi in this example. Do you have any queries that search on fi but not fo?
SELECT ... FROM fee WHERE fi=?
If fi were the first column of the primary key, this would do partition-pruning, and also use the PRIMARY KEY index because your search term is on the first column.
mysql> explain partitions select * from fee where fi = 175;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | fee | p2 | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
Whereas if fi were the second column of the primary key, then it could do partition-pruning, but not use the index.
mysql> explain partitions select * from fee where fi = 175;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | fee | p2 | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
Indexes are also partitioned. Think of partitioning as a series of completely separate tables, with the same columns and same indexes, just a subset of the rows. Once the query determines which partition to read, it does the query the same way it would against a non-partitioned table, choosing an index based on the query criteria. Will it use the primary key to search?
mysql> explain partitions select * from fee where fi = 175 and created_at < now();
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | fee | p2 | range | created_at | created_at | 6 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
Here we see the condition on fi resulted in partition pruning, and yet the index on created_at was preferred by the optimizer. It searches that index in the respective partition.
"you place the most selective keys first and the less selective ones later" -- No. That is an old wives tale.
Put keys that are tested with '=' first is a simple and more important rule.
Think of a composite InnoDB BTree index as working this way. Concatenate all the columns together, then picture the BTree as having a single string as the key.
Putting the "partition key" first in an index is the least useful place! You are already pruning on that; having it in the index is actually redundant. However, it is necessary for any Unique key (that includes the `PRIMARY KEY').
Yes, you correctly observed that the PK columns are implicitly included in every secondary key, hence the partition key is included.
Note that if the partition key is not really part of a desired UNIQUE key, then the uniqueness constraint is not possible (in MySQL). However, the tacked-on PK is not part of the uniqueness constraint. Since MySQL is only willing to check uniqueness for one partition, you must include the partition key to also provide the semantics that states "Unique" across the entire table. (Yeah, it is a bit convoluted; live with it.)
In your example, if you do SELECT .. WHERE fi BETWEEN 1 and 2 AND fo=3, any index (the PK is an index) starting with fi would work harder than if fo were first in the index.
So, a Rule of Thumb is to move the partition key to the end of any index that includes it. (I have seen only one rare exception; I forget the details.)
I have a table below:
CREATE TABLE `student` (
`name` varchar(30) NOT NULL DEFAULT '',
`city` varchar(30) NOT NULL DEFAULT '',
`age` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`name`,`city`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I want to know, if I execute the following two SQLs, do they have the same performance?
mysql> select * from student where name='John' and city='NewYork';
mysql> select * from student where city='NewYork' and name='John';
Involved question:
If there is a multi-column indexes (name, city), do the two SQLs all use it?
Does the optimizer change the second sql to the first because of the index?
I execute explain on the two of them, the result is below:
mysql> explain select * from student where name='John' and city='NewYork';
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| 1 | SIMPLE | student | const | PRIMARY | PRIMARY | 184 | const,const | 1 | NULL |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
mysql> explain select * from student where city='NewYork' and name='John';
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
| 1 | SIMPLE | student | const | PRIMARY | PRIMARY | 184 | const,const | 1 | NULL |
+----+-------------+---------+-------+---------------+---------+---------+-------------+------+-------+
If, given an index on (name,city), I execute the following two SQLs, do they have the same performance?
where name='John' and city='NewYork'
where city='NewYork' and name='John'
Yes.
The query planner doesn't care about the order of WHERE clauses. If both your clauses filter on equality, the planner can use the index. SQL is a declarative language, not procedural. That is, you say what you want, not how to get it. It's a little counterintuitive for many programmers.
It can also use the (name,city) index for WHERE name LIKE 'Raymo%' because name is first in the index. It cannot use that index for WHERE city = 'Kalamazoo', though.
It can use the index for WHERE city LIKE 'Kalam%' AND name = 'Raymond'. In that case it uses the index to find the name value, then scans for matching cities.
If you had an index on (city,name) you could also use that for WHERE city = 'Kalamazoo' AND name = 'Raymond'. If both indexes exist, the query planner will pick one, probably based on some kind of cardinality consideration.
Note. If instead you have two different indexes on city and name, the query planner can't (as of mid-2017) use more than one of them to satisfy WHERE city = 'Kalamazoo' AND name = 'Raymond'.
http://use-the-index-luke.com/ for good info.
The order of columns in a multi-column index matters.
The documentation of the multiple-column indexes reads:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
This means an index on columns name and city can be used when an index on column name is needed but it cannot be used instead of an index on column city.
The order of conditions in the WHERE clause doesn't matter. The MySQL optimizer does a lot of work on the conditions on the WHERE clause to eliminate as many candidate rows as possible as early as possible and to read as little data as possible from the tables and indexes (because some of the read data is dropped because it doesn't match the entire WHERE clause).
My query took 28.39 seconds to run. How can I optimize it?
explain SELECT distinct UNIX_TIMESTAMP(timestamp)*1000 as timestamp,count(a.sig_name) as counter from event a,network n where n.fsi='pays' and n.net=inet_ntoa(a.ip_src) group by date(timestamp) order by timestamp asc;
+----+-------------+-------+--------+---------------+---------+---------+--- ---+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 8177074 | Using temporary; Using filesort |
| 1 | SIMPLE | n | eq_ref | PRIMARY,fsi | PRIMARY | 77 | func | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+------+---------+---------------------------------+
So generally looking at your query, we find that table event a is examining 8,177,074 rows. That is likely the "root" of the slowness, so we want to look at how to reduce the search space using indexes.
The main condition on event a is
n.net=inet_ntoa(a.ip_src)
The problem here is that we need to perform a calculation (inet_ntoa) on every row of a.ip_src, so there is no alternative but to scan the entire table. A potentially better solution would be to invert the comparison and ensure that a.ip_src is indexed.
a.ip_src=inet_aton(n.net)
This will only be better if we are matching less rows in n than we are in a. If that is not the case, you should seriously consider caching the result of this function in the table and creating an index on that.
Lastly I am guessing the timestamp column is in event a, in which case an index will potentially help with ordering and grouping though may not. You could try a multi_column index on (ip_src,timestamp)
Make it a practice to introduce at-least index on columns which can be used in WHERE/JOIN clauses. I've used the at-least because in many cases one should try to use PRIMARY/FOREIGN KEY relations. So if something is already a primary/foriegn key there is no need to index it further.
The above query can be simply improved by introducing the INDEX through the following query:
ALTER TABLE events ADD INDEX idx_ev_ipsrc (ip_src);
Here idx_ev_ipsrc = Name of the index key, and ip_src is the column to be indexed.
Even further enhancement:
Introduce multi-colum index on network table using following query:
ALTER TABLE network ADD INDEX idx_net_fsi_net (fsi,net);
The above will result in even low number of rows.
Note: The above queries are for MySql and can be tailored for other DBs easily.
I created a table partitioned on a numeric ID:
CREATE TABLE mytable (
...
`id` int(11) DEFAULT NULL
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY HASH (`id`) PARTITIONS 100
I have no primary key, but a number of indices. I don't have any data in my table where id is less than 0 or greater than 30 (at the moment, I expect this to grow). Most of my queries first include the id to reduce the search space.
I figured a query to select distinct(id) from mytable would then just return the number of partitions that had data in it. I was surprised that an explain on this instead does a full scan of the data:
explain partitions select distinct(id) from mytable;
| 1 | SIMPLE | mytable | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99 | ALL | NULL | NULL | NULL | NULL | 24667132 | Using temporary |
explain select distinct(id) from mytable;
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
| 1 | SIMPLE | mytable | ALL | NULL | NULL | NULL | NULL | 24667132 | Using temporary |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
I then read this stackoverflow answer which enlightened how MySQL's partition hash() function works.
My question is, how can I get MySQL to map each id in the table into its own partition such that selects with the id narrow the search to a single table (and a select distinct() just has to count the number of partitions and not scan them)?
I'm using Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu).
First off, your conflating two different things. One is the fact that a SELECT WHERE id = ? should only search one partition. Something which you mentioned but didn't specify whether it currently works or not (given your table definition, I don't see why it shouldn't).
The second thing, having a SELECT distinct(id) to only touch the partitioning information, is very different from this. However, if I understand you correctly, you're assuming that one partition only has one kind of id. That is not how HASH partitioning works, though. It works similar to a traditional hash-table, by mapping a large key space to a small one, in your case, 100. So each partition will have many possible IDs. Since mysql will not keep track which of the possible IDs are really in one partition all it can do is to scan each partition, do the DISTINCT, and give back the result. That said, it could to do the DISTINCT operation on the individual partitions instead of the whole table and it could do this in parallel, however, the explain seems to imply that it will create one big temporary to do the DISTINCT, likely because this optimization hasn't been implemented yet.
I've been testing out different ideas for optimizing some of the tables we have in our system at work. Today I came across a table that tracks every view on each vehicle in our system. Create table below.
SHOW CREATE TABLE vehicle_view_tracking;
CREATE TABLE `vehicle_view_tracking` (
`vehicle_view_tracking_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`public_key` varchar(45) NOT NULL,
`vehicle_id` int(10) unsigned NOT NULL,
`landing_url` longtext NOT NULL,
`landing_port` int(11) NOT NULL,
`http_referrer` longtext,
`created_on` datetime NOT NULL,
`created_on_date` date NOT NULL,
`server_host` longtext,
`server_uri` longtext,
`referrer_host` longtext,
`referrer_uri` longtext,
PRIMARY KEY (`vehicle_view_tracking_id`),
KEY `vehicleViewTrackingKeyCreatedIndex` (`public_key`,`created_on_date`),
KEY `vehicleViewTrackingKeyIndex` (`public_key`)
) ENGINE=InnoDB AUTO_INCREMENT=363439 DEFAULT CHARSET=latin1;
I was playing around with multi-column and single column indexes. I ran the following query:
EXPLAIN EXTENDED SELECT dealership_vehicles.vehicle_make, dealership_vehicles.vehicle_model, vehicle_view_tracking.referrer_host, count(*) AS count
FROM vehicle_view_tracking
LEFT JOIN dealership_vehicles
ON dealership_vehicles.dealership_vehicle_id = vehicle_view_tracking.vehicle_id
WHERE vehicle_view_tracking.created_on_date >= '2011-09-07' AND vehicle_view_tracking.public_key IN ('ab12c3')
GROUP BY (dealership_vehicles.vehicle_make) ASC , dealership_vehicles.vehicle_model, referrer_host
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | vehicle_view_tracking | range | vehicleViewTrackingKeyCreatedIndex,vehicleViewTrackingKeyIndex | vehicleViewTrackingKeyCreatedIndex | 50 | NULL | 23086 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dealership_vehicles | eq_ref | PRIMARY | PRIMARY | 8 | vehicle_view_tracking.vehicle_id | 1 | 100.00 | |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
(Execution time for actual select query was .309 seconds)
then I change the date in the where clause from '2011-09-07' to '2011-07-07' and got the following explain results
EXPLAIN EXTENDED SELECT dealership_vehicles.vehicle_make, dealership_vehicles.vehicle_model, vehicle_view_tracking.referrer_host, count(*) AS count
FROM vehicle_view_tracking
LEFT JOIN dealership_vehicles
ON dealership_vehicles.dealership_vehicle_id = vehicle_view_tracking.vehicle_id
WHERE vehicle_view_tracking.created_on_date >= '2011-07-07' AND vehicle_view_tracking.public_key IN ('ab12c3')
GROUP BY (dealership_vehicles.vehicle_make) ASC , dealership_vehicles.vehicle_model, referrer_host
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | vehicle_view_tracking | ref | vehicleViewTrackingKeyCreatedIndex,vehicleViewTrackingKeyIndex | vehicleViewTrackingKeyIndex | 47 | const | 53676 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dealership_vehicles | eq_ref | PRIMARY | PRIMARY | 8 | vehicle_view_tracking.vehicle_id | 1 | 100.00 | |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
(Execution time for actual select query was .670 seconds)
I see 4 main changes:
type changed from range to ref
key changed from vehicleViewTrackingKeyCreatedIndex to vehicleViewTrackingKeyIndex
key_len changed from 50 to 47 (caused by the change in key)
rows changed from 23086 to 53676 (caused by the change in key)
At this point, the execution time is only .6 seconds for the slow query however we only have about 10% of our vehicles in our database.
It's getting late and I may have overlooked something in the mysql docs but I can't seem to find why the key (and in turn the type and rows) are changing when the date is changed in the where clause.
The help is greatly appreciated. I searched for someone having the same/similar issue with a date causing this change and was not able to find anything. If I missed a previous post, please link me :-)
Different search strategies make sense for different data. In particular, index scans (such as range) often have to do a seek to actually read the row. At some point, doing all those seeks is slower than not using the index at all.
Take a trivial example, a table with three columns: id (primary key), name (indexed), birthday. Say it has a lot of data. If you ask MySQL to look for Bob's birthday, it can do that fairly quickly: first, it finds Bob in the name index (this takes a few seeks, log(n) where n is the row count), then one additional seek to read the actual row in the data file and read the birthday from it. That's very quick, and far quicker than scanning the entire table.
Next, consider doing a name like 'Z%'. That is probably a fairly small portion of the table. So its still faster to find where the Zs start in the name index, then for each one seek the data file to read the row. (This is a range scan).
Finally, consider asking for all names starting with M-Z. That's probably around half the data. It could do a range scan, and then a lot of seeks, but seeking randomly over the datafile with the ultimate goal of reading half the rows isn't optimal: it'd be faster to just do a big sequential read over the data file. So, in this case, the index will be ignored.
This is what you're seeing—except in your case, there is another key it can fall back on. (Its also possible that it might actually use the date index if it didn't have the other, it should pick whichever index will be quickest. Beware that MySQL's optimizer often makes errors in this.)
So, in short, this is expected. A query doesn't say how to retrieve the data, rather it says what data to retrieve. The database's optimizer is supposed to find the quickest way to retrieve it.
You may find an index on both columns, in the order (public_key,created_on_date) is preferred in both cases, and speeds up your query. This is because MySQL can only ever use one index per table (per query). Also, the date goes at the end because a range scan can only be done efficiently on the last column in an index.
[InnoDB actually has another layer of indirection, I believe, but it'd just confuse the point. It doesn't make a difference to the explanation.]