Very simple problem yet hard to find a solution.
Address table with 2,498,739 rows has a field of min_ip and max_ip fields. These are the core anchors of the table for filtering.
The query is very simple.
SELECT *
FROM address a
WHERE min_ip < value
AND max_ip > value;
So it is logical to create an index for the min_ip and max_ip to make the query faster.
Index created for the following.
CREATE INDEX ip_range ON address (min_ip, max_ip) USING BTREE;
CREATE INDEX min_ip ON address (min_ip ASC) USING BTREE;
CREATE INDEX max_ip ON address (max_ip DESC) USING BTREE;
I did try to create just the first option (combination of min_ip and max_ip) but it did not work so I prepared at least 3 indexes to give MySQL more options for index selection. (Note that this table is pretty much static and more of a lookup table)
+------------------------+---------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------------+------+-----+---------------------+-----------------------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| network | varchar(20) | YES | | NULL | |
| min_ip | int(11) unsigned | NO | MUL | NULL | |
| max_ip | int(11) unsigned | NO | MUL | NULL | |
+------------------------+---------------------+------+-----+---------------------+-----------------------------+
Now, it should be straight forward to query the table with min_ip and max_ip as the filter criteria.
EXPLAIN
SELECT *
FROM address a
WHERE min_ip < 2410508496
AND max_ip > 2410508496;
The query performed something around 0.120 to 0.200 secs. However, on a load test, the query rapidly degrades performance.
MySQL server CPU usage sky rocket to 100% CPU usage on just a few simultaneous queries and performance degrades rapidly and does not scale up.
Slow query on mysql server was turned on with 10 secs or higher, and eventually the select query shows up in the logs just after a few seconds of load test.
So I checked the query with explain and found out that it did'nt use an index.
Explain plan result
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ---------------------- ------ ------- ------ ------- -------------
1 SIMPLE a ALL ip_range,min_ip,max_ip (NULL) (NULL) (NULL) 2417789 Using where
Interestingly, it was able to determine ip_range, ip_min and ip_max as potential indexes but never use any of it as shown in the key column.
I know I can use FORCE INDEX and tried to use explain plan on it.
EXPLAIN
SELECT *
FROM address a
FORCE INDEX (ip_range)
WHERE min_ip < 2410508496
AND max_ip > 2410508496;
Explain plan with FORCE INDEX result
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ------------- -------- ------- ------ ------- -----------------------
1 SIMPLE a range ip_range ip_range 4 (NULL) 1208894 Using index condition
With FORCE INDEX, yes it uses the ip_range index as key, and rows shows a subset from the query that does not use FORCE INDEX which is 1,208,894 from 2,417,789.
So definitely, using the index should have better performance. (Unless I misunderstood the explain result)
But what is more interesting is, after a couple of test, I found out that on some instances, MySQL does use index even without FORCE INDEX. And my observation is when the value is small, it does use the index.
EXPLAIN
SELECT *
FROM address a
WHERE min_ip < 508496
AND max_ip > 508496;
Explain Result
id select_type table type possible_keys key key_len ref rows Extra
------ ----------- ------ ------ ---------------------- -------- ------- ------ ------ -----------------------
1 SIMPLE a range ip_range,min_ip,max_ip ip_range 4 (NULL) 1 Using index condition
So, it just puzzled me that base on the value pass to the select query, MySQL decides when to use an index and when not to use an index.
I can't imagine what is the basis for determining when to use the index on a certain value being passed to the query. I do understand that
index may not be used if there is no matching index suitable in the WHERE condition but in this case, it is very clear the ip_range index which
is an index based on min_ip and max_ip column is suitable for the WHERE condition in this case.
But the bigger problem I have is, what about other queries. Do I have to go and test those queries on a grand scale.
But even then, as the data grows, can I rely and expect MySQL to use the index?
Yes, I can always use FORCE INDEX to ensure it uses the index. But this is not standard SQL that works on all database.
ORM frameworks may not be able to support FORCE INDEX syntax when they generate the SQL and it tightly couples your query with your index names.
Not sure if anyone has ever encountered this issue but this seems to be a very big problem for me.
Fully agree with Vatev and the others. Not only MySQL does that. Scanning the table is sometimes cheaper than looking at the index first then looking up corresponding entries on disk.
The only time when it for sure uses the index is, when it's a covering index, which means, that every column in the query (for this particular table of course) is present in the index. Meaning, if you need for example only the network column
SELECT network
FROM address a
WHERE min_ip < 2410508496
AND max_ip > 2410508496;
then a covering index like
CREATE INDEX ip_range ON address (min_ip, max_ip, network) USING BTREE;
would only look at the index as there's no need to lookup additional data on disk at all. And the whole index could be kept in memory.
Ranges like that are nasty to optimize. But I have a technique. It requires non-overlapping ranges and stores only a start_ip, not the end_ip (which is effectively available from the 'next' record). It provides stored routines to hide the messy code, involving ORDER BY ... LIMIT 1 and other tricks. For most operations it won't hit more than one block of data, unlike the obvious approaches that tend to fetch half or all the table.
I do agree to all the answers above. but you can try to make only one composite
index like this:
create index ip_rang on address (min_ip ASC,max_ip DESC) using BTREE;
As you know index is also has the disadvantage of using your disk space so consider the optimal index for using.
Related
Let's say I have a MySQL table defined like this:
CREATE TABLE big_table (
primary_1 varbinary(1536),
primary_2 varbinary(1536),
ts timestamp(6),
...
PRIMARY KEY (primary_1, primary_2),
KEY ts_idx (ts),
)
I would like to implement efficient pagination (seeking pagination) as described in this blog post https://use-the-index-luke.com/sql/partial-results/top-n-queries
If I only use the first part of the primary key, the pipelined execution works fast and as expected:
mysql> explain select * from big_table order by ts, primary_1 limit 5;
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
| 1 | SIMPLE | big_table | NULL | index | NULL | ts_idx | 7 | NULL | 5 | 100.00 | NULL |
+----+-------------+-------------------------------------+------------+-------+---------------+--------+---------+------+------+----------+-------+
However, if I add the second part of the primary key to the ORDER BY clause everything slows down and filesort starts being used:
mysql> explain select * from big_table order by ts, primary_1, primary_2 limit 5;
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
| 1 | SIMPLE | big_table | NULL | ALL | NULL | NULL | NULL | NULL | 6388499 | 100.00 | Using filesort |
+----+-------------+-------------------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------+
Is it not possible to do this pipelined execution and ordering on composite primary? Or should the query be written in some special way?
Without prior knowledge about how MySQL works internally, there is no reason to assume that an index on just ts can be used to order by ts, primary_1 without doing an additonal (file)sort on primary_1. Imagine e.g. the edge case that all values for ts are the same - the index will just give you all rows, which you then have to sort by primary_1.
Nevertheless, MySQL can make use of some additional information: InnoDB stores secondary indexes in a way that includes the primary key columns (to be able to find the actual row in the table). Since that information is there anyway, MySQL can just make use of it - and it does, by using Index Extensions. This basically extends the index ts to an index ts, primary_1, primary_2.
So this technical trick allows you to use the index on ts to order by ts, primary_1, primary_2. But since there is always a "but", here is the "but":
Use of index extensions by the optimizer is subject to the usual limits on the number of key parts in an index (16) and the maximum key length (3072 bytes).
The index on ts, primary_1, primary_2 would be longer than 3072 bytes. You can e.g. also not create such an index manually. So this extension doesn't work anymore, and MySQL falls back to treating the index on ts like an index on just ts.
So why does it work for order by ts, primary_1? Well, even if, for those technical reasons, MySQL cannot create an internal index on ts, primary_1, primary_2, it could at least do it for ts, primary_1 without running into technical problems. MySQL actually doesn't do that though - but the MariaDB developers implemented this trick, so I assume you are actually using MariaDB. Nevertheless, the length restriction of 3072 still applies, so your order by both primary columns still won't work.
What can you do?
If you can shorten your primary keys a bit, the index extension would work again. Primary keys that long (and of that type) are uncommon and unpractical anyway (not only for this use case), so maybe you can find a different primary key for your table.
If that is not an option, you may be able to utilize some prior knowledge about your data distribution, e.g. if you know that at most 10 values for ts can be the same, you can first pick the first n+10 rows (using the index), then order only those by the primary keys. If you usually only show the first few pages, this might speed up your specific situation. But you may want to ask a separate question for it with specific details.
I have some tables I want to join, but it cannot take dozens of seconds.
I want to go from this query that takes ~1s
SELECT COUNT(*) FROM business_group bg WHERE bg.group_id=1040
+----------+
| COUNT(*) |
+----------+
| 1229380 |
+----------+
1 row in set
Time: 1.173s
to this joined query that is taking ~50s
SELECT COUNT(*) FROM business b
INNER JOIN business_group bg ON b.id=bg.business_id
WHERE bg.group_id=1040
+----------+
| COUNT(*) |
+----------+
| 1229380 |
+----------+
1 row in set
Time: 51.346s
Why does it take that long if the only thing it does differently is to join on the primary key of the business table (business.id)?
Besides this primary key index, I also have this one (group_id, business_id) on business_group (with (business_id, group_id) it took even longer).
Following is the execution plan:
+----+-------------+-------+------------+--------+---------------------------------------------------------+-----------------------------+---------+----------------------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------------------------------------------------+-----------------------------+---------+----------------------+---------+----------+--------------------------+
| 1 | SIMPLE | bg | <null> | ref | FKo2q0jurx07ein31bgmfvuk8gf,idx_bg_group_id_business_id | idx_bg_group_id_business_id | 9 | const | 2654528 | 100.0 | Using index |
| 1 | SIMPLE | b | <null> | eq_ref | PRIMARY | PRIMARY | 4 | database.bg.group_id | 1 | 100.0 | Using where; Using index |
+----+-------------+-------+------------+--------+---------------------------------------------------------+-----------------------------+---------+----------------------+---------+----------+--------------------------+
Is it possible to optimize the second query so it takes less time?
business table is ~45M rows while business_group is ~60M rows.
I'm writing this as someone who does a lot of indexing setups on SQL Server rather than MySQL. It is too long as a comment, and is based on what I believe are fundamentals, so hopefully it will help.
Why?
Firstly - why does it take so long for the second query to run? The answer is that it needs to do a lot more work in the second one
To demonstrate, imagine the only non-clustered index you have is on business_group for group_id.
You run the first query SELECT COUNT(*) FROM business_group bg WHERE bg.group_id=1040.
All the engine needs to do is to seek to the appropriate spot in the index (where group_id = 1040), then read/count rows from the index (which is just a series of ints) until it changes - because the non-clustered index is sorted by that column.
Note if you had a second field in the non-clustered index (e.g., group_id, business_id), it would be almost as fast because it's still sorted on group_id first. However, it will be a slightly larger read as each row is 2x the size of the single-column version (but would still be very small).
Imagine you then run a slightly different query, counting business_id instead of * e.g., SELECT COUNT(business_id) FROM business_group bg WHERE bg.group_id=1040.
Assuming business_id is not the PK (and is not in the non-clustered index), then for every row it finds in the index, it then needs to go back and read the business_id from the table check it's not null (either in some sort of loop/reference, or read the whole table - I'm not 100% on how MySQL deals with these). However, it is a lot more work than above.
If business_id was in the index (as above, for group_id, business_id), then it could read that data straight from the index and not need to refer back to the original table - which is good.
Now add the join (your second query) SELECT COUNT(*) FROM business b INNER JOIN business_group bg ON b.id=bg.business_id WHERE bg.group_id=1040. The engine needs to
Get each business_id as above
Potentially sort the business IDs to help with the join
Join it to the business table (to ensure it has a valid row in the business table)
... and to do so, it may need to read all the row's data in the business table
Suggestions #1 - Avoid going to the business table
If you set up foreign keys to ensure that business_id in business_group is valid - then do you need to run the version with the join? Just run the first version.
Suggestion #2 - Indexes
If this was SQL Server and you needed that second query to run as fast as possible, I would set up two non-clustered indexes
NONCLUSTERED INDEX ... ON business_group (group_id, business_id)
NONCLUSTERED INDEX ... ON business (id)
The first means the engine can seek directly to the specific group_id, and then get a sorted list of business_id.
The second provides a sorted list of id (business_id) from the business table. As it has the same sort as the the results from the first index, it means the join is a lot less work.
However, the second one is controversial - many people would say 'no' to this as it overlaps your PK (or, more specifically, clustered index). It would also be sorted the same way. However, at least in SQL Server, this would include all the other data about the businesses e.g., the business name, address, etc. So to read the list of IDs from business, you'd also need to read the rest of the data - taking a lot more time.
However, if you put a non-clustered index just on ID, it will be a very narrow index (just the IDs) and therefore the amount of data to be read would be much less - and therefore often a lot faster.
Note though, that this is not as fast as if you could avoid doing the join altogether (e.g., Suggestion #1 above).
We are currently evaluating mysql for one of our use case related to analytics.
The table schema is some what like this
CREATE TABLE IF NOT EXISTS `analytics`(
`date` DATE,
`dimension1` BIGINT UNSIGNED,
`dimension2` BIGINT UNSIGNED,
`metrics1` BIGINT UNSIGNED,
`metrics2` BIGINT UNSIGNED,
INDEX `baseindex` (`dimension1`,`dt`)
);
Since most query would be around dimension 1 and date we felt that a combined index would be our best case to optimize query lookup
With this table schema in mind an explain query returns the following results
explain
select dimension2,dimension1
from analytics
where dimension1=1123 and dt between '2016-01-01' and '2016-01-30';
The following query returns the
+----+-------------+-----------+------+---------------+-----------+---------+-------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-----------+---------+-------------+------+-----------------------+
| 1 | SIMPLE | analytics | ref | baseindex | baseindex | 13 | const,const | 1 | Using index condition |
+----+-------------+-----------+------+---------------+-----------+---------+-------------+------+-----------------------+
This look good so far as we are getting indication that the indexes are being fired up.
However we though if we can optimize this a bit further, since most of our lookups will be for the current month or month based lookup we felt a date partitioning will further improve the performance.
The table was later modified to add partitions by month
ALTER TABLE analytics
PARTITION BY RANGE( TO_DAYS(`dt`))(
PARTITION JAN2016 VALUES LESS THAN (TO_DAYS('2016-02-01')),
PARTITION FEB2016 VALUES LESS THAN (TO_DAYS('2016-03-01')),
PARTITION MAR2016 VALUES LESS THAN (TO_DAYS('2016-04-01')),
PARTITION APR2016 VALUES LESS THAN (TO_DAYS('2016-05-01')),
PARTITION MAY2016 VALUES LESS THAN (TO_DAYS('2016-06-01')),
PARTITION JUN2016 VALUES LESS THAN (TO_DAYS('2016-07-01')),
PARTITION JUL2016 VALUES LESS THAN (TO_DAYS('2016-08-01')),
PARTITION AUG2016 VALUES LESS THAN (TO_DAYS('2016-09-01')),
PARTITION SEPT2016 VALUES LESS THAN (TO_DAYS('2016-10-01')),
PARTITION OCT2016 VALUES LESS THAN (TO_DAYS('2016-11-01')),
PARTITION NOV2016 VALUES LESS THAN (TO_DAYS('2016-12-01')),
PARTITION DEC2016 VALUES LESS THAN (TO_DAYS('2017-01-01'))
);
With the partition in place, the same query now returns the following results
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | analytics | range | baseindex | baseindex | 13 | NULL | 1 | Using where |
+----+-------------+-----------+-------+---------------+-----------+---------+------+------+-------------+
Now the "Extra" column show that its switching to where instead of using index condition.
We have not noticed any performance boost or degradation, so curios to know how does adding a partition changes the value inside the extra column
This is too long for a comment.
MySQL partitions both the data and the indexes. So, the result of your query is that the query is accessing a smaller index which refers to fewer data pages.
Why don't you see a performance boost? Well, looking up rows in a smaller index is negligible savings (although there might be some savings for the first query from a cold start because the index has to be loaded into memory).
I am guessing that the data you are looking for is relatively small -- say, the records come from a handful of data pages. Well, fetching a handful of data pages from a partition is pretty much the same thing as fetching a handful of data pages from the full table.
Does this mean that partitioning is useless? Not at all. For one thing, the partitioned data and index is much smaller than the overall table. So, you have a savings in memory on the server side -- and this can be a big win on a busy server.
In general, though, partitions really shine when you have queries that don't fully use indexes. The smaller data sizes in each partition often make such queries more efficient.
Use NOT NULL (wherever appropriate).
Don't use BIGINT (8 bytes) unless you really need huge numbers. Dimension ids can usually fit in SMALLINT UNSIGNED (0..64K, 2 bytes) or MEDIUMINT UNSIGNED. (0..16M, 3 bytes).
Yes, INDEX(dim1, dt) is optimal for that one SELECT.
No, PARTITIONing will not help for that SELECT.
PARTITION BY RANGE(TO_DAYS(..)) is excellent if you intend to delete old data. But there is rarely any other benefit.
Use InnoDB.
Explicitly specify the PRIMARY KEY. It will be important in the discussion below.
When working with huge databases, it is a good idea to "count the disk hits". So, let's analyze your query.
INDEX(dim1, dt) with WHERE dim1 = a AND dt BETWEEN x and y will
If partitioned, prune down to the partition(s) representing x..y.
Drill down in the Index's BTree to [a,x]. With partitioning the BTree might be 1 level shallower, but that savings is lost to the pruning of step 1.
Scan forward until [a,y]. If only one partition is involved, this scan hits exactly the same number of blocks whether partitioned or not. If multiple partitions are needed, then there is some extra overhead.
For each row, use the PRIMARY KEY to reach over into the data to get dim2. Again, virtually the same amount of effort. Without the Engine and PRIMARY KEY, I cannot finish discussion this #4.
If (dim1, dim2, dt) is unique, make it the PK. In this case, INDEX(dim1, dt) is actually dim1, dt, dim2 since the PK is included in every secondary index. That says that #4 really involves a 'covering' index. That is, the no extra work to reach for dim2 (zero disk hits).
If, on the other hand, you did SELECT metric..., then #4 does have the effort mentioned.
I just stumbled upon a few lines of code in a system I just started working with that I don't really get. The system has a large table that saves lots of entities with unique IDs and removes them once they're not longer needed but it never reuses them. So the table looks like this
------------------------
| id |info1|info2|info3|
------------------------
| 1 | foo1| foo2| foo3|
------------------------
| 17 | bar1| bar2| bar3|
------------------------
| 26 | bam1| bam2| bam3|
------------------------
| 328| baz1| baz2| baz3|
------------------------
etc.
In one place in the codebase there is a while loop whose purpose it is to loop through all entities in the DB and do things to them and right now this is solved like this
int lastId = fetchMaxId()
int id = 0
while (id = fetchNextId()){
doStuffWith(id)
}
where fetchMaxId is straight forward
int fetchMaxId(){
return sqlQuery("SELECT MAX(id) FROM Table")
}
but fetchNextId confuses me. It is implemented as
int fetchNextId(currentId, maxId){
return sqlQuery("
SELECT id FROM Table where id > :currentId and id <= :maxId LIMIT 1
")
}
This system has been in production for several years so it obviously works but when I tried searching for a solution to why this works I only found people saying the same thing that I already thought i knew. The order in which a MySQL DB returns the result is not easily determined and should not be relied upon so if you wan't a particular order use a ORDER BY clause. But are there some times you can safely ignore the ORDER BY? This code has worked for 12 years and continued to work through several DB updates. Are we just lucky or am I missing something here? Before I saw this code I would have said that if you called
fetchNextId(1, 328)
you could end up with either 17 or 26 as the answer.
Some clues to why this works may be that the id column is the primary key of the Table in question and it's set to auto increment but I can't find any documentation that would explain why
fetchNextId(1, 328)
should always returns 17 when called on the table-snippet given above.
The short answer is yes, the primary key has an order, all indexes have an order, and a primary key is simply a unique index.
As you have rightly said, you should not rely on data being returned in the order the data is stored in, the optimiser is free to return it in any order it likes, and this will be dependent on the query plan. I will however attempt to explain why your query has worked for 12 years.
Your clustered index is just your table data, and your clustering key defines the order that it is stored in. The data is stored on the leaf, and the clustering key helps the root (and intermediate notes) act as pointers to quickly get to the right leaf to retrieve the data. A nonclustered index is a very similar structure, but the lowest level simply contains a pointer to the correct position on the leaf of the clustered index.
In MySQL the primary key and the clustered index are synonymous, so the primary key is ordered, however they are fundamentally two different things. In other DBMS you can define both a primary key and a clustered index, when you do this your primary key becomes a unique nonclustered index with a pointer back to the clustered index.
In it's simplest terms you can imagine a table with an ID column that is the primary key, and another column (A), your B-Tree structure for your clustered index would be something like:
Root Node
+---+
| 1 |
+---+
Intermediate Nodes
+---+ +---+ +---+
| 1 | | 4 | | 7 |
+---+ +---+ +---+
Leaf
+-----------+ +-----------+ +-----------+
ID -> | 1 | 2 | 3 | | 4 | 5 | 6 | | 7 | 8 | 9 |
A -> | A | B | C | | D | E | F | | G | H | I |
+-----------+ +-----------+ +-----------+
In reality the leaf pages will be much bigger, but this is just a demo. Each page also has a pointer to the next page and the previous page for ease of traversing the tree. So when you do a query like:
SELECT ID, A
FROM T
WHERE ID > 5
LIMIT 1;
you are scanning a unique index so it is very likely this will be a sequential scan. Very likely is not guaranteed though.
MySQL will scan the Root node, if there is a potential match it will move on to the intermediate nodes, if the clause had been something like WHERE ID < 0 then MySQL would know that there were no results without going any further than the root node.
Once it moves on to the intermediate node it can identify that it needs to start on the second page (between 4 and 7) to start searching for an ID > 5. So it will sequentially scan the leaf starting on the second leaf page, having already identified the LIMIT 1 it will stop once it finds a match (in this case 6) and return this data from the leaf. In such a simple example this behaviour appears to be reliable and logical. I have tried to force exceptions by choosing an ID value I know is at the end of a leaf page to see if the leaf will be scanned in the reverse order, but as yet have been unable to produce this behaviour, this does not however mean it won't happen, or that future releases of MySQL won't do this in the scenarios I have tested.
In short, just add an order by, or use MIN(ID) and be done with it. I wouldn't lose too much sleep trying to delve into the inner workings of the query optimiser to see what kind of fragmentation, or data ranges would be required to observe different ordering of the clustered index within the query plan.
The answer to your question is yes. If you look at MySQL documentation you will see that whenever a table has a primary key it has an associated index.
When looking at the documentation for indexes you will see that they will mention primary keys as a type of index.
So in case of your particular scenario:
SELECT id FROM Table where id > :currentId and id <= :maxId LIMIT 1
The query will stop executing as soon as it has found a value because of the LIMIT 1.
Without the LIMIT 1 it would have returned 17, 24 and 328.
However will all that said I don't think you will run into any order problems when the primary key is auto incrementing but whenever there is a scenario were the primary key is a unique employee no. instead of an auto incrementing field I would not trust the order of the result because the documentation also notes that MySQL reads sequentially, so the possibility is there that a primary key could fall out of the WHERE clause conditions and be skipped.
I have a table from a legacy system which does not have a primary key. It records transactional data for issuing materials in a factory.
For simplicities sake, lets say each row contains job_number, part_number, quantity & date_issued.
I added an index to the date issued column. When I run an EXPLAIN SELECT * FROM issued_parts WHERE date_issued > '20100101', it shows this:
+----+-------------+----------------+------+-------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | issued_parts | ALL | date_issued_alloc | NULL | NULL | NULL | 9724620 | Using where |
+----+-------------+----------------+------+-------------------+------+---------+------+---------+-------------+
So it sees the key, but it doesn't use it?
Can someone explain why?
Something tells me the MySQL Query Optimizer decided correctly.
Here is how you can tell. Run these:
Count of Rows
SELECT COUNT(1) FROM issued_parts;
Count of Rows Matching Your Query
SELECT COUNT(1) FROM issued_parts WHERE date_issued > '20100101';
If the number of rows you are actually retrieving exceeds 5% of the table's total number, the MySQL Query Optimizer decides it would be less effort to do a full table scan.
Now, if your query was more exact, for example, with this:
SELECT * FROM issued_parts WHERE date_issued = '20100101';
then, you will get a different EXPLAIN plan altogether.
possible_keys names keys with the relevant columns in, but that doesn't mean that each key in it is going to be useful for the query. In this case, none are.
There are multiple types of indexes (indices?). A hash index is a fast way to do a lookup on an item given a specific value. If you have a bunch of discreet values that you are querying against, (for example, a list of 10 dates) then you can calculate a hash for each of those values, and look them up in the index. Since you aren't doing a lookup on a specific value, but rather doing a comparison, a hash index won't help you.
On the other hand, a B-Tree index can help you because it gives an ordering to the elements it is indexing. For instance, see here: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html for mysql (search for B-Tree Index Characteristics) . You may want to check that your table is using a b-tree index for it's index column.