Why doesn't MySQL use my partitions as indices? - mysql

I created a table partitioned on a numeric ID:
CREATE TABLE mytable (
...
`id` int(11) DEFAULT NULL
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY HASH (`id`) PARTITIONS 100
I have no primary key, but a number of indices. I don't have any data in my table where id is less than 0 or greater than 30 (at the moment, I expect this to grow). Most of my queries first include the id to reduce the search space.
I figured a query to select distinct(id) from mytable would then just return the number of partitions that had data in it. I was surprised that an explain on this instead does a full scan of the data:
explain partitions select distinct(id) from mytable;
| 1 | SIMPLE | mytable | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99 | ALL | NULL | NULL | NULL | NULL | 24667132 | Using temporary |
explain select distinct(id) from mytable;
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
| 1 | SIMPLE | mytable | ALL | NULL | NULL | NULL | NULL | 24667132 | Using temporary |
+----+-------------+----------------------+------+---------------+------+---------+------+----------+-----------------+
I then read this stackoverflow answer which enlightened how MySQL's partition hash() function works.
My question is, how can I get MySQL to map each id in the table into its own partition such that selects with the id narrow the search to a single table (and a select distinct() just has to count the number of partitions and not scan them)?
I'm using Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu).

First off, your conflating two different things. One is the fact that a SELECT WHERE id = ? should only search one partition. Something which you mentioned but didn't specify whether it currently works or not (given your table definition, I don't see why it shouldn't).
The second thing, having a SELECT distinct(id) to only touch the partitioning information, is very different from this. However, if I understand you correctly, you're assuming that one partition only has one kind of id. That is not how HASH partitioning works, though. It works similar to a traditional hash-table, by mapping a large key space to a small one, in your case, 100. So each partition will have many possible IDs. Since mysql will not keep track which of the possible IDs are really in one partition all it can do is to scan each partition, do the DISTINCT, and give back the result. That said, it could to do the DISTINCT operation on the individual partitions instead of the whole table and it could do this in parallel, however, the explain seems to imply that it will create one big temporary to do the DISTINCT, likely because this optimization hasn't been implemented yet.

Related

MySQL seeking pagination on big composite primary key

Let's say I have a MySQL table defined like this:
CREATE TABLE big_table (
primary_1 varbinary(1536),
primary_2 varbinary(1536),
ts timestamp(6),
...
PRIMARY KEY (primary_1, primary_2),
KEY ts_idx (ts),
)
I would like to implement efficient pagination (seeking pagination) as described in this blog post https://use-the-index-luke.com/sql/partial-results/top-n-queries
If I only use the first part of the primary key, the pipelined execution works fast and as expected:
mysql> explain select * from big_table order by ts, primary_1 limit 5;
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
| 1 | SIMPLE | big_table | NULL | index | NULL | ts_idx | 7 | NULL | 5 | 100.00 | NULL |
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
However, if I add the second part of the primary key to the ORDER BY clause everything slows down and filesort starts being used:
mysql> explain select * from big_table order by ts, primary_1, primary_2 limit 5;
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| 1 | SIMPLE | big_table | NULL | ALL | NULL | NULL | NULL | NULL | 6388499 | 100.00 | Using filesort |
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
Is it not possible to do this pipelined execution and ordering on composite primary? Or should the query be written in some special way?
Without prior knowledge about how MySQL works internally, there is no reason to assume that an index on just ts can be used to order by ts, primary_1 without doing an additonal (file)sort on primary_1. Imagine e.g. the edge case that all values for ts are the same - the index will just give you all rows, which you then have to sort by primary_1.
Nevertheless, MySQL can make use of some additional information: InnoDB stores secondary indexes in a way that includes the primary key columns (to be able to find the actual row in the table). Since that information is there anyway, MySQL can just make use of it - and it does, by using Index Extensions. This basically extends the index ts to an index ts, primary_1, primary_2.
So this technical trick allows you to use the index on ts to order by ts, primary_1, primary_2. But since there is always a "but", here is the "but":
Use of index extensions by the optimizer is subject to the usual limits on the number of key parts in an index (16) and the maximum key length (3072 bytes).
The index on ts, primary_1, primary_2 would be longer than 3072 bytes. You can e.g. also not create such an index manually. So this extension doesn't work anymore, and MySQL falls back to treating the index on ts like an index on just ts.
So why does it work for order by ts, primary_1? Well, even if, for those technical reasons, MySQL cannot create an internal index on ts, primary_1, primary_2, it could at least do it for ts, primary_1 without running into technical problems. MySQL actually doesn't do that though - but the MariaDB developers implemented this trick, so I assume you are actually using MariaDB. Nevertheless, the length restriction of 3072 still applies, so your order by both primary columns still won't work.
What can you do?
If you can shorten your primary keys a bit, the index extension would work again. Primary keys that long (and of that type) are uncommon and unpractical anyway (not only for this use case), so maybe you can find a different primary key for your table.
If that is not an option, you may be able to utilize some prior knowledge about your data distribution, e.g. if you know that at most 10 values for ts can be the same, you can first pick the first n+10 rows (using the index), then order only those by the primary keys. If you usually only show the first few pages, this might speed up your specific situation. But you may want to ask a separate question for it with specific details.

Optimize query?

My query took 28.39 seconds to run. How can I optimize it?
explain SELECT distinct UNIX_TIMESTAMP(timestamp)*1000 as timestamp,count(a.sig_name) as counter from event a,network n where n.fsi='pays' and n.net=inet_ntoa(a.ip_src) group by date(timestamp) order by timestamp asc;
+----+-------------+-------+--------+---------------+---------+---------+--- ---+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 8177074 | Using temporary; Using filesort |
| 1 | SIMPLE | n | eq_ref | PRIMARY,fsi | PRIMARY | 77 | func | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+------+---------+---------------------------------+
So generally looking at your query, we find that table event a is examining 8,177,074 rows. That is likely the "root" of the slowness, so we want to look at how to reduce the search space using indexes.
The main condition on event a is
n.net=inet_ntoa(a.ip_src)
The problem here is that we need to perform a calculation (inet_ntoa) on every row of a.ip_src, so there is no alternative but to scan the entire table. A potentially better solution would be to invert the comparison and ensure that a.ip_src is indexed.
a.ip_src=inet_aton(n.net)
This will only be better if we are matching less rows in n than we are in a. If that is not the case, you should seriously consider caching the result of this function in the table and creating an index on that.
Lastly I am guessing the timestamp column is in event a, in which case an index will potentially help with ordering and grouping though may not. You could try a multi_column index on (ip_src,timestamp)
Make it a practice to introduce at-least index on columns which can be used in WHERE/JOIN clauses. I've used the at-least because in many cases one should try to use PRIMARY/FOREIGN KEY relations. So if something is already a primary/foriegn key there is no need to index it further.
The above query can be simply improved by introducing the INDEX through the following query:
ALTER TABLE events ADD INDEX idx_ev_ipsrc (ip_src);
Here idx_ev_ipsrc = Name of the index key, and ip_src is the column to be indexed.
Even further enhancement:
Introduce multi-colum index on network table using following query:
ALTER TABLE network ADD INDEX idx_net_fsi_net (fsi,net);
The above will result in even low number of rows.
Note: The above queries are for MySql and can be tailored for other DBs easily.

MySQL join on table with partitions selecting all partitions

I have a photo gallery on my site with 1M photos in it. There are 2 search tables associated with it. Table #1 contains a list of words used in the photos. Table #2 contains a list of what words match up with what photos. Table #2 is 7M rows. I am testing partitioning this 7M row table because I have another set of tables with 120,000,000 rows. Queries against the 120M row wordmatch table below, with or without a join again the wordlist table below, take multiple seconds to run.
I am trying to perform a join between these 2 tables and MySQL 5.6 EXPLAIN PARTITIONS shows it is using all the partitions. How can I redo this query to make this correctly use only a single partition?
The 2 tables:
CREATE TABLE wordlist (
word_text varchar(50) NOT NULL DEFAULT '',
word_id mediumint(8) unsigned NOT NULL AUTO_INCREMENT
PRIMARY KEY (word_text),
KEY word_id (word_id)
) ENGINE=InnoDB
CREATE TABLE wordmatch (
pic_id int(11) unsigned NOT NULL DEFAULT '0',
word_id mediumint(8) unsigned NOT NULL DEFAULT '0',
title_match tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (word_id,pic_id,title_match),
KEY pic_id (pic_id)
) ENGINE=InnoDB
/*!50100 PARTITION BY HASH (word_id)
PARTITIONS 11 */;
SQL query I am performing:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon' AND m.word_id = w.word_id
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| 1 | SIMPLE | w | NULL | range | PRIMARY,word_id | PRIMARY | 52 | NULL | 1 | Using where |
| 1 | SIMPLE | m | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10 | ref | PRIMARY | PRIMARY | 3 | w.word_id | 34 | Using index |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
The join produces a query that uses all partitions.
If I retrieve the word_id # first and go straight against the wordmatch table, everything is ok:
EXPLAIN PARTITIONS SELECT m.pic_id FROM wordmatch m WHERE m.word_id = 219657;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | m | p9 | ref | PRIMARY | PRIMARY | 3 | const | 18220 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
How do I get this to work correctly?
I prefer not to split this into multiple queries if possible.
You may have noticed I am using LIKE above. People will often search on bacon% to get plurals of words, etc.
Example:
SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon%' AND m.word_id = w.word_id
I realize this wildcard search may result in 2 or more partitions being selected. This is probably ok, although if there is a way to change the partitioning to prevent that, I welcome any tips.
Edit #1: Added details as my original question was confusing. I was testing my 7M row table first before doing my 120M row table.
Edit #2: Resolution to my overall issue: My performance issues seem to be resolved as I partitioned my 120M row table into 101 partitions per this post: MySQL performance: partitions I do not know if MySQL is going against all the partitions at runtime - Ollie Jones says it does not in the comments below and EXPLAIN PARTITIONS is incorrect - but it is fast now so I am happy.
To get your query working with efficient indexing is probably a good idea before you dive into the partitioning project. Here's your query refactored to use JOIN:
SELECT m.pic_id
FROM wordlist w
JOIN wordmatch m ON w.word_id = m.word_id
WHERE w.word_text LIKE 'bacon%'
This query can use a compound index on wordlist (word_test, word_id). It will random-access the index for the first matching word_text, and then scan the index retrieving the word_id values until it gets to the last matching `word_text.
It can also use your existing primary key on wordmatch (word_id, pic_id) It speeds up your query because the data base engine can satisfy your query directly from the index without having to bat the hard drive back and forth to the table itself.
So, give those indexes a try. Your large table, the wordmatch table, should work fairly well without partitioning. It's more common to partition tables that contain lots of content (like the text of articles) than it is to partition this kind of fixed-row-size join table.
Notice that your EXPLAIN announces it will look at all the partitions because EXPLAIN can't tell which partition (or partitions) your w.word_text LIKE 'bacon%' WHERE-clause will need to examine. EXPLAIN isn't as dumb as a box of hammers, but it is close. MySQL won't examine the partitions it doesn't need to, but it doesn't know which partitions are involved until runtime.
Have you considered using FULLTEXT search? It might simplify what you're doing.
Your first query doesn't have any filtering conditions on wordmatch table that could limit the partitions in use, thus it needs to access all partitions. There is no way to redo this query to use only necessary partitions without adding a filter on the field that is the basis for the partitioning (word_id).
The second query filters on a specific word_id value, so the index knows exactly which partition to point to.
I would also agree with comment made by #OllieJones that I am not sure you should really worry about partitioning at only 7M rows. That is not really that big of a table in the grand schema of things.

Why my mysql answer that "not using key" when I use rand in where

I have a table that has 4,000,000 records.
The table is created that : (user_id int, partner_id int, PRIMARY_KEY ( user_id )) engine=InnoDB;
I want to test the performance of select 100 records.
Then, I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( 1 );
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | MY_TABLE | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set, 1 warning (0.00 sec)
This is OK.
But, this query is buffered by mysql.
So, this test make no after the first test.
Then, I thinked of a sql that select by random value.
I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( select ceil( rand() ) );
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | MY_TABLE | index | NULL | PRIMARY | 4 | NULL | 3998727 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
But, it's bad.
Explain shows that possible_keys is NULL.
So, full index scanning is planned, and in fact, it's too slow rather than the one before.
Then, I want to ask you to teach me how do I write random value with index looking up.
Thanks
Using rand() in SQL is usually a sure-fire way to make the query slow. A common theme here is people using it in ORDER BY to get a random sequence. It's slow because not only does it throw away the indexes, but it also reads through the whole table.
However in your case, the fact that the function calls are in a sub-query ought to allow the outer query to still use its indexes. The fact that it isn't seems quite odd (so I've given the question a +1 vote).
My theory is that perhaps MySQL's optimiser is getting it wrong -- it's seeing the functions in the inner query, and deciding incorrectly that it can't use an index.
The only thing I can suggest to work around that is using force index to push MySQL into using the index you want.
See the definition of rand().
If i understand right, you are trying to get a random record from the database. If that is the case, again from the rand() definition:
ORDER BY RAND() combined with LIMIT is useful for selecting a random sample from a set of rows:
SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000;
It's a limitation of the MySQL optimizer, that it can't tell that the subquery returns exactly one value, it has to assume the subquery returns multiple rows with unpredictable values, potentially even all the values of user_id. Therefore it decides it's just going to do an index scan.
Here's a workaround:
mysql> explain select user_id from MY_TABLE use index (PRIMARY)
where user_id = ( select ceil( rand() ) );
Note that MySQL's RAND() function returns a value in the range 0 <= v < 1.0. If you CEIL() it, you'll likely get the value 1. Therefore you'll virtually always get the row where user_id=1. If you don't have such a row in your table, you'll get an empty set result. You certainly won't get a user chosen randomly among all your users.
To fix that problem, you'd have to multiply the rand() by the number of distinct user_id values. And that brings up the problem that you might have gaps, so a randomly chosen value won't match any existing user_id.
Re your comment:
You'll always see possible keys as NULL when you get an index scan (i.e., "type" is "index").
I tried your explain query on a similar table, and it appears that the optimizer can't figure out that the subquery is a constant expression. You can workaround this limitation by calculating the random number in application code and then using the result as a constant value in your query:
select user_id from MY_TABLE use index (PRIMARY)
where user_id = $random;

MySQL EXPLAIN 'type' changes from 'range' to 'ref' when the date in the where statement is changed?

I've been testing out different ideas for optimizing some of the tables we have in our system at work. Today I came across a table that tracks every view on each vehicle in our system. Create table below.
SHOW CREATE TABLE vehicle_view_tracking;
CREATE TABLE `vehicle_view_tracking` (
`vehicle_view_tracking_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`public_key` varchar(45) NOT NULL,
`vehicle_id` int(10) unsigned NOT NULL,
`landing_url` longtext NOT NULL,
`landing_port` int(11) NOT NULL,
`http_referrer` longtext,
`created_on` datetime NOT NULL,
`created_on_date` date NOT NULL,
`server_host` longtext,
`server_uri` longtext,
`referrer_host` longtext,
`referrer_uri` longtext,
PRIMARY KEY (`vehicle_view_tracking_id`),
KEY `vehicleViewTrackingKeyCreatedIndex` (`public_key`,`created_on_date`),
KEY `vehicleViewTrackingKeyIndex` (`public_key`)
) ENGINE=InnoDB AUTO_INCREMENT=363439 DEFAULT CHARSET=latin1;
I was playing around with multi-column and single column indexes. I ran the following query:
EXPLAIN EXTENDED SELECT dealership_vehicles.vehicle_make, dealership_vehicles.vehicle_model, vehicle_view_tracking.referrer_host, count(*) AS count
FROM vehicle_view_tracking
LEFT JOIN dealership_vehicles
ON dealership_vehicles.dealership_vehicle_id = vehicle_view_tracking.vehicle_id
WHERE vehicle_view_tracking.created_on_date >= '2011-09-07' AND vehicle_view_tracking.public_key IN ('ab12c3')
GROUP BY (dealership_vehicles.vehicle_make) ASC , dealership_vehicles.vehicle_model, referrer_host
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | vehicle_view_tracking | range | vehicleViewTrackingKeyCreatedIndex,vehicleViewTrackingKeyIndex | vehicleViewTrackingKeyCreatedIndex | 50 | NULL | 23086 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dealership_vehicles | eq_ref | PRIMARY | PRIMARY | 8 | vehicle_view_tracking.vehicle_id | 1 | 100.00 | |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+------------------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
(Execution time for actual select query was .309 seconds)
then I change the date in the where clause from '2011-09-07' to '2011-07-07' and got the following explain results
EXPLAIN EXTENDED SELECT dealership_vehicles.vehicle_make, dealership_vehicles.vehicle_model, vehicle_view_tracking.referrer_host, count(*) AS count
FROM vehicle_view_tracking
LEFT JOIN dealership_vehicles
ON dealership_vehicles.dealership_vehicle_id = vehicle_view_tracking.vehicle_id
WHERE vehicle_view_tracking.created_on_date >= '2011-07-07' AND vehicle_view_tracking.public_key IN ('ab12c3')
GROUP BY (dealership_vehicles.vehicle_make) ASC , dealership_vehicles.vehicle_model, referrer_host
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | vehicle_view_tracking | ref | vehicleViewTrackingKeyCreatedIndex,vehicleViewTrackingKeyIndex | vehicleViewTrackingKeyIndex | 47 | const | 53676 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dealership_vehicles | eq_ref | PRIMARY | PRIMARY | 8 | vehicle_view_tracking.vehicle_id | 1 | 100.00 | |
+----+-------------+-----------------------+--------+----------------------------------------------------------------+-----------------------------+---------+----------------------------------------------+-------+----------+----------------------------------------------+
(Execution time for actual select query was .670 seconds)
I see 4 main changes:
type changed from range to ref
key changed from vehicleViewTrackingKeyCreatedIndex to vehicleViewTrackingKeyIndex
key_len changed from 50 to 47 (caused by the change in key)
rows changed from 23086 to 53676 (caused by the change in key)
At this point, the execution time is only .6 seconds for the slow query however we only have about 10% of our vehicles in our database.
It's getting late and I may have overlooked something in the mysql docs but I can't seem to find why the key (and in turn the type and rows) are changing when the date is changed in the where clause.
The help is greatly appreciated. I searched for someone having the same/similar issue with a date causing this change and was not able to find anything. If I missed a previous post, please link me :-)
Different search strategies make sense for different data. In particular, index scans (such as range) often have to do a seek to actually read the row. At some point, doing all those seeks is slower than not using the index at all.
Take a trivial example, a table with three columns: id (primary key), name (indexed), birthday. Say it has a lot of data. If you ask MySQL to look for Bob's birthday, it can do that fairly quickly: first, it finds Bob in the name index (this takes a few seeks, log(n) where n is the row count), then one additional seek to read the actual row in the data file and read the birthday from it. That's very quick, and far quicker than scanning the entire table.
Next, consider doing a name like 'Z%'. That is probably a fairly small portion of the table. So its still faster to find where the Zs start in the name index, then for each one seek the data file to read the row. (This is a range scan).
Finally, consider asking for all names starting with M-Z. That's probably around half the data. It could do a range scan, and then a lot of seeks, but seeking randomly over the datafile with the ultimate goal of reading half the rows isn't optimal: it'd be faster to just do a big sequential read over the data file. So, in this case, the index will be ignored.
This is what you're seeing—except in your case, there is another key it can fall back on. (Its also possible that it might actually use the date index if it didn't have the other, it should pick whichever index will be quickest. Beware that MySQL's optimizer often makes errors in this.)
So, in short, this is expected. A query doesn't say how to retrieve the data, rather it says what data to retrieve. The database's optimizer is supposed to find the quickest way to retrieve it.
You may find an index on both columns, in the order (public_key,created_on_date) is preferred in both cases, and speeds up your query. This is because MySQL can only ever use one index per table (per query). Also, the date goes at the end because a range scan can only be done efficiently on the last column in an index.
[InnoDB actually has another layer of indirection, I believe, but it'd just confuse the point. It doesn't make a difference to the explanation.]