What does means cardinality in MySQL when use composite index? - mysql

I'm new with mysql and am a little confused about what cardinality means, I read that it means the number or unique rows but I'd like to know what it does mean in this case, this is my table definition
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| revisado | varchar(10) | YES | MUL | NULL | |
| total | int(11) | NO | MUL | NULL | |
| busqueda | varchar(300) | NO | MUL | NULL | |
| clave | bigint(15) | NO | | NULL | |
| producto_servicio | varchar(300) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+----------------+
the total of records right now is 13621
I have this query
SELECT clave, producto_servicio FROM buscador_claves2 WHERE busqueda = 'FERRETERIA' AND total = 2 AND revisado = 'APROBADO'
And this the index definition of the table
+------------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+------------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| buscador_claves2 | 0 | PRIMARY | 1 | id | A | 14309 | NULL | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_busqueda | 1 | busqueda | A | 14309 | 255 | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_total | 1 | total | A | 3 | NULL | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_revisado | 1 | revisado | A | 1 | NULL | NULL | YES | BTREE | |
| buscador_claves2 | 1 | idx_compuesto1 | 1 | revisado | A | 1 | NULL | NULL | YES | BTREE | |
| buscador_claves2 | 1 | idx_compuesto1 | 2 | total | A | 105 | NULL | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_compuesto1 | 3 | busqueda | A | 14309 | 255 | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_compuesto2 | 1 | busqueda | A | 14309 | 255 | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_compuesto2 | 2 | total | A | 14309 | NULL | NULL | | BTREE | |
| buscador_claves2 | 1 | idx_compuesto2 | 3 | revisado | A | 14309 | NULL | NULL | YES | BTREE | |
+------------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
the query is taking idx_compuesto1 as the index to find the data, what means the cardinality in this case for the revisado, total and busqueda columns as part of the index idx_compuesto1? and why it takes idx_compuesto1 instead of idx_compuesto2, I can see the cardinality is different in both indexes
This is the output of the query explain
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: buscador_claves2
type: ref
possible_keys: idx_busqueda,idx_total,idx_revisado,idx_compuesto1,idx_compuesto2
key: idx_compuesto1
key_len: 804
ref: const,const,const
rows: 1
Extra: Using where
I hope you can help me to understand better this info, thank you.

In MySQL, the value of the index cardinality column is the storage engine estimate for the number of unique values in that index. It is used to determine how well this index can be used during joins. Generally MySQL optimizer prefers the index with a higher cardinality, because it usually means it is able to filter down to fewer rows. The ideal scenario is for the value of cardinality to be always equal to SELECT COUNT(DISTINCT the_key)..., but in practice it is usually off by some relatively small margin due to the difficulty of accurately computing this during normal database operations in an efficient manner that does not disrupt database performance. The value will be more accurate immediately after ANALYZE TABLE. Being off on cardinality begins to matter when the optimizer can choose more than one key for a particular join, it makes a huge difference in performance which one gets chosen, and the cardinality estimates for those keys are sufficiently off to cause the optimizer to choose the wrong key. Those situations are relatively rare, but do happen. In that case, the problem can be solved either with ANALYZE TABLE or - if you are always 100% sure which key is better for the join - by explicitly making the optimizer use it with FORCE KEY in the query.

Related

Query optimizer not using an index

I have two tables CUSTOMER_ORDER_PUBLIC and LINEITEM_PUBLIC which have the following indices:
+-----------------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| CUSTOMER_ORDER_PUBLIC | 1 | O_ORDERKEY | 1 | O_ORDERKEY | A | 2633457 | NULL | NULL | YES | BTREE | | |
| CUSTOMER_ORDER_PUBLIC | 1 | O_ORDERDATE | 1 | O_ORDERDATE | A | 2350 | NULL | NULL | YES | BTREE | | |
| CUSTOMER_ORDER_PUBLIC | 1 | PUB_C_CUSTKEY | 1 | PUB_C_CUSTKEY | A | 273000 | NULL | NULL | | BTREE | | |
+-----------------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
and:
+-----------------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| LINEITEM_PUBLIC | 0 | PRIMARY | 1 | PUB_L_ORDERKEY | A | 16488602 | NULL | NULL | | BTREE | | |
| LINEITEM_PUBLIC | 0 | PRIMARY | 2 | PUB_L_LINENUMBER | A | 44146904 | NULL | NULL | | BTREE | | |
| LINEITEM_PUBLIC | 1 | LINEITEM_PRIVATE_FK2 | 1 | PUB_L_PARTKEY | A | 2083757 | NULL | NULL | | BTREE | | |
| LINEITEM_PUBLIC | 1 | LINEITEM_PRIVATE_FK3 | 1 | PUB_L_SUPPKEY | A | 85599 | NULL | NULL | | BTREE | | |
+-----------------+------------+----------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Each time I run an Explain of a specific query I get the following:
mysql> EXPLAIN SELECT *
FROM CUSTOMER_ORDER_PUBLIC
LEFT OUTER JOIN LINEITEM_PUBLIC ON O_ORDERKEY= PUB_L_ORDERKEY;
+----+-------------+-----------------------+------------+------+---------------+---------+---------+---------------------------------------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+---------------+---------+---------+---------------------------------------+---------+----------+-------+
| 1 | SIMPLE | CUSTOMER_ORDER_PUBLIC | NULL | ALL | NULL | NULL | NULL | NULL | 2900769 | 100.00 | NULL |
| 1 | SIMPLE | LINEITEM_PUBLIC | NULL | ref | PRIMARY | PRIMARY | 4 | TPCH.CUSTOMER_ORDER_PUBLIC.O_ORDERKEY | 2 | 100.00 | NULL |
+----+-------------+-----------------------+------------+------+---------------+---------+---------+---------------------------------------+---------+----------+-------+
For some reason the query optimizer is not using the index (O_ORDERKEY) even if I use a FORCE INDEX. I know a lot of people posted similar questions but I tried everything and nothing seems to help!
Any other suggestions would be greatly appreciated!
Edit:
The query used is the following:
SELECT * FROM CUSTOMER_ORDER_PUBLIC
LEFT OUTER JOIN LINEITEM_PUBLIC ON O_ORDERKEY= PUB_L_ORDERKEY;
For this query:
SELECT *
FROM CUSTOMER_ORDER_PUBLIC cop LEFT OUTER JOIN
LINEITEM_PUBLIC lp
ON cop.O_ORDERKEY = lp.PUB_L_ORDERKEY;
For this query, you want an index on LINEITEM_PUBLIC(PUB_L_ORDERKEY). Of course, you already have this index because this is the first key in the primary key.
There is no reason to use an index on CUSTOMER_ORDER_PUBLIC, because all rows in the table are going to the result set.
The FORCE INDEX hint tells the optimizer that a full scan of the table is very expensive.
The most likely explanation for the observed behavior is that the optimizer thinks it needs to access every row in the table, and the index suggested in the hint is not a covering index for the query.
Based on the EXPLAIN output, we only see evidence of a single predicate on the JOIN operation. And it looks like the optimizer is choosing CUSTOMER_ORDER_PUBLIC as the driving table for the join, and using an index on the LINEITEM_PUBLIC table.
I'm not sure any of that answers the question you asked. (I'm not sure that there was a question asked.) Absent an actual SQL statement, we are just making guesses.
I have a question: Aside from the FORCE INDEX hint, why would we expect the optimizer to use a particular index? And why would that be a reasonable expectation?

MySQL best index for static table

I have the following table (T) in Mysql:
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| first | varchar(50) | NO | PRI | NULL | |
| second | varchar(50) | NO | PRI | NULL | |
| third | varchar(50) | NO | PRI | NULL | |
| count | bigint(20) | NO | | NULL | |
+--------+-------------+------+-----+---------+-------+
This table contains several million rows. I have created the following indices:
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| T | 0 | PRIMARY | 1 | first | A | 591956 | NULL | NULL | | BTREE | | |
| T | 0 | PRIMARY | 2 | second | A | 67927032 | NULL | NULL | | BTREE | | |
| T | 0 | PRIMARY | 3 | third | A | 271708128 | NULL | NULL | | BTREE | | |
| T | 1 | SECONDARY | 1 | second | A | 398399 | NULL | NULL | | BTREE | | |
| T | 1 | SECONDARY | 2 | third | A | 45284688 | NULL | NULL | | BTREE | | |
| T | 1 | SEC | 1 | second | A | 4382389 | NULL | NULL | | BTREE | | |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Searches of the type:
SELECT * FROM T WHERE first = "WHAT" AND third = "EVER";
and
SELECT * FROM T WHERE first = "WHAT" AND second = "EVER";
also usually fast (results are always obtained under 1 second). However the searches like:
SELECT * FROM T WHERE second = "WHAT" AND third = "EVER";
are very slow (usually more than 1 minute). I created the index SEC (see indices table), but that doesn't improve the results.
What index should I use to make these searches faster? (I haven't kept experimenting because the creation of one index takes around 5 hours)
MORE INFO: The table is static (i.e. I won't be adding any more rows - I am only interested in search speed), and disk space is not an issue.
Use additional indexes comprising of fields which match your queries. If the row combinations are unique then use primary indexes. These give quicker access than secondary indexes.
As the table is static - the number of indexes will not affect performance (any updates, deletions and insertions require updates to each index of a table).
So for quicker retrieval from this query create an index of second and third columns:
ALTER TABLE T ADD PRIMARY KEY (second, third);

MySQL index causes queries to become slow

I have a MySQL table with some 20 million rows of data in it.
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| b_id | int(11) | YES | MUL | NULL | |
| order | bigint(20) | YES | MUL | NULL | |
| date | date | YES | | NULL | |
| time | time | YES | | NULL | |
| channel | varchar(8) | YES | MUL | NULL | |
| data | varchar(60) | YES | | NULL | |
| date_system | date | YES | MUL | NULL | |
| time_system | time | YES | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
I had an non unique index on (b_id, channel, date) to speed up queries like:
select date, left(time,2) as hour, round(data,1) as data
from data_lines
where channel='1'
and b_id='300'
and date >='2013-04-19'
and date <='2013-04-26'
group by date,hour
The problem was that my inserts sometimes overlap, so I wanted to use 'ON DUPLICATE KEY UPDATE', however this needs a unique index. So I create a unique index on (b_id, channel, date, time) as these are the four main characteristics to determine if there is a double value. The inserts now work fine, however my select queries are unacceptable slow.
I'm not quite sure why my selects have become slower since the addition of the new index:
is time so unique that the index becomes very large --> and slow?
should I remove the non unique index to speed things up?
is it my bad querying?
other ideas welcome!
For the record (order, date_system and time_system) are not used at all in indexes or selects, but do contain data. The inserts are run from C and Python and the selects from PHP.
Per request the explain query:
mysql> explain select date, left(time,2) as hour, round(data,1) as data
from data_lines
where channel='1'
and b_id='300'
and date >='2013-04-19'
and date <='2013-04-26'
group by date,hour;
+----+-------------+-----------+------+--------------------------------+------------+---------+-------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+--------------------------------+------------+---------+-------------+------+----------------------------------------------+
| 1 | SIMPLE | data_lines| ref | update_index,b_id,comp_index | comp_index | 16 | const,const | 3548 | Using where; Using temporary; Using filesort |
+----+-------------+-----------+------+--------------------------------+------------+---------+-------------+------+----------------------------------------------+
The update_index is my unique index of (b_id, channel, date, time) and the comp_index is my non unique index of (b_id, channel, date).
Indexes are:
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| data_lines| 0 | PRIMARY | 1 | id | A | 17918898 | NULL | NULL | | BTREE | | |
| data_lines| 0 | id_UNIQUE | 1 | id | A | 17918898 | NULL | NULL | | BTREE | | |
| data_lines| 0 | update_index | 1 | channel | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 0 | update_index | 2 | b_id | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 0 | update_index | 3 | date | A | 44244 | NULL | NULL | YES | BTREE | | |
| data_lines| 0 | update_index | 4 | time | A | 17918898 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | box_id | 1 | b_id | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | idx | 1 | order | A | 17918898 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | comp_index | 1 | b_id | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | comp_index | 2 | channel | A | 6624 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | comp_index | 3 | date | A | 165915 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | date_system | 1 | date_system | A | 17 | NULL | NULL | YES | BTREE | | |
| data_lines| 1 | mac | 1 | mac | A | 17 | NULL | NULL | YES | BTREE | | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Try explicitly specifying USE INDEX(update_index) in your query.
the optimizer is making wrong choice in selecting in selecting the index because of which the query is becoming slow.
Hope this solves your problem.. :)
Since a PRIMARY KEY is a UNIQUE KEY, get rid of the useless UNIQUE(id).
Are any of the columns we are talking about ever NULL? If not, make them NOT NULL. (This is important before upgrading the UNIQUE index.)
Unless you need it for some other query, DROP comp_index. It provides no extra benefit (toward your INSERT or SELECT) over the 4-column unique_index.
Do you use id anywhere else? If not, promote the 4-col unique index to be PRIMARY KEY. This step is likely to speed things up because now it is not bouncing back and forth between the index and the data (to get data).
That leaves 4 other indexes; see if you really need them. (I suggest this because a previous step will make secondary indexes bulkier.)
Change to InnoDB if you are using MyISAM.
When doing lots of ALTERs, do them in a single statement -- it will be a lot faster.
ALTER TABLE ...
DROP COLUMN id,
DROP PRIMARY KEY,
DROP INDEX `id_UNIQUE`,
DROP INDEX comp_index,
ADD PRIMARY KEY(channel, b_id, date, time),
ALTER COLUMN ... NOT NULL,
...
ENGINE=InnoDB;
Or, to be more cautious: CREATE the modified table, then INSERT...SELECT to populate it. Then test. Eventually do RENAME TABLE to put it into place.
It is usually a bad idea to split date and time into two columns instead of having a single datetime. But I won't push it, since it probably does not affect this Question much.

Mysql Join on range of time values (no exact relation)

I am evaluating logfiles for a research project and inserted them to a MySQL database.
Now I have a query where I need to join data from other tables without having an exact matching value.
The "logdata" table contains the data of mobile units I have to analyze, "basepositions" holds the GPS coordinates of base stations. In two of the data fields of "logdata" the sender position of the corresponding base station is logged. The problem there is: the position of the base station varies slightly over time (GPS fluctuation, just some degrees), so I have to look for the right entry by using the BETWEEN operation as seen in the query below. This is not perfect, but there are only about 100 base stations, so the cost is tolerable here.
The same problem exists in the second join. There I have to get a validity flag out of another table. The problem here is: both logs are written approximately every second, but are not synchronized. So i have to scan for the corresponding row, again by using BETWEEN and the time range of 1 second.
Because of the number of rows, this second scan lets my execution time explode.
I think the diffuse correlation is the problem here.
The two tables both have the indexes given in the overview below.
Is there a way to speed up the query? Because of the performance problems it now takes 30 hours to complete in my database setup to return around 20000 rows.
I appreciate any help.
logdata (~ 300.000.000 entries):
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| unit | tinytext | YES | MUL | NULL | |
| timestamp | bigint(20) | YES | | NULL | |
| logid | int(11) | YES | | NULL | |
| d1 | bigint(20) | YES | | NULL | |
| d2 | bigint(20) | YES | | NULL | |
| d3 | bigint(20) | YES | | NULL | |
| d4 | bigint(20) | YES | | NULL | |
| d5 | bigint(20) | YES | | NULL | |
| d6 | bigint(20) | YES | | NULL | |
| d7 | bigint(20) | YES | | NULL | |
| d8 | bigint(20) | YES | | NULL | |
| d9 | bigint(20) | YES | | NULL | |
| d10 | bigint(20) | YES | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
basepositions (~100 entries):
+----------------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | NULL | |
| GPSLONGITUDE | varchar(50) | YES | | NULL | |
| LOCATION | varchar(100) | YES | | NULL | |
| GPSLATITUDE | varchar(50) | YES | | NULL | |
| GPSALTITUDE | varchar(50) | YES | | NULL | |
| ISUNDERTEST | tinyint(1) | YES | | 0 | |
+----------------------------+--------------+------+-----+---------+-------+
validity (~200.000.000 entries):
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| unit | tinytext | YES | MUL | NULL | |
| timestamp | bigint(20) | YES | | NULL | |
| logid | int(11) | YES | | NULL | |
| d1 | bigint(20) | YES | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
my query so far:
SELECT
logdata.unit,
logdata.timestamp,
logdata.d1,
logdata.d2,
cast(logdata.d3/10000000 as decimal(15, 10)),
cast(logdata.d4/10000000 as decimal(15, 10)),
logdata.d5,
logdata.d6,
logdata.d7,
logdata.d8,
cast(logdata.d9/10000000 as decimal(15, 10)),
cast(logdata.d10/10000000 as decimal(15, 10)),
BASEID,
validity.d1
FROM
logdata
JOIN
basepositions
ON
cast(GPSLATITUDE / 10000000 as decimal(15,10)) BETWEEN cast(d3 / 10000000 as decimal(15,10)) - 0.0001 AND cast(d3 / 10000000 as decimal(15,10)) + 0.0001
AND
cast(GPSLONGITUDE / 10000000 as decimal(15,10)) BETWEEN cast(d4 / 10000000 as decimal(15,10)) - 0.0001 AND cast(d4 / 10000000 as decimal(15,10)) + 0.0001
JOIN
validity
ON
validity.unit = logdata.unit
AND
validity.logid = 12345
AND
validity.timestamp BETWEEN logdata.timestamp - 500 AND logdata.timestamp + 499
WHERE
logdata.unit = "IVS${IVS}"
AND
logdata.logid = 111222
AND
BASEID = 012;
indeces:
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| logdata | 0 | PRIMARY | 1 | id | A | 301433830 | NULL | NULL | | BTREE | | |
| logdata | 1 | unit_logid_timestamp | 1 | unit | A | 18 | 6 | NULL | YES | BTREE | | |
| logdata | 1 | unit_logid_timestamp | 2 | logid | A | 18 | NULL | NULL | YES | BTREE | | |
| logdata | 1 | unit_logid_timestamp | 3 | timestamp | A | 301433830 | NULL | NULL | YES | BTREE | | |
+-------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT (Comment field was to small):
I think the problem is the join that is constructed. EXPLAIN EXTENDED shows, that the query optimizer is joining all three tables together, which means 300.000.000 * 200.000.000 * 100 rows to look through.
When I rewrite the join with "validity" to a subquery mysql is just joining "logdata" and "basepositions".
I think data type changes could be a factor in later optimizing but first i think i have to get down a few runtime classes by optimizing the query plan.
I'm not experienced enough to know what i can do to further optimize this query.
The single query for a timestamp on "validity" returns in no time at all.
The single query for the Basestation position is also very fast.
I don't know how I can convince mysql to first filter and the join my query.
EDIT 2:
here are the idexes you asked for. I got them using "SHOW INDEXES FROM"
indexes for "validity":
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| validity | 0 | PRIMARY | 1 | id | A | 194863653 | NULL | NULL | | BTREE | | |
| validity | 1 | unit_logid_timestamp | 1 | unit | A | 18 | 6 | NULL | YES | BTREE | | |
| validity | 1 | unit_logid_timestamp | 2 | logid | A | 18 | NULL | NULL | YES | BTREE | | |
| validity | 1 | unit_logid_timestamp | 3 | timestamp | A | 194863653 | NULL | NULL | YES | BTREE | | |
+-------------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
indexes for "basepositions":
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| basepositions | 0 | PRIMARY | 1 | ID | A | 109 | NULL | NULL | | BTREE | | |
+----------------------+------------+---------------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EXPLAIN of the query above:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE basepositions const PRIMARY PRIMARY 4 const 1 100.00
1 SIMPLE logdata ref unit_logid_timestamp unit_logid_timestamp 14 const,const 4150932 100.00 Using where
1 SIMPLE validity ref unit_logid_timestamp unit_logid_timestamp 14 const,const 3294136 100.00 Using where
EXPLAIN (after adding indexes for lat/lon):
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE basepositions const PRIMARY,lat_lon,lat,lon PRIMARY 4 const 1 100.00
1 SIMPLE logdata ref unit_logid_timestamp unit_logid_timestamp 14 const,const 4150932 100.00 Using where
1 SIMPLE validity ref unit_logid_timestamp unit_logid_timestamp 14 const,const 3294136 100.00 Using where

Simple MySQL query with performance issues

I have the following simple MySQL query:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31';
When I run this it takes 0.29 seconds to bring back 36074 results. When I increase my date period to bring back more results (65703) it runs in 0.56. When I run other similar SQL queries on the same server but on different tables (some tables are larger) the results come back in approximately 0.01 seconds.
Although 0.29 isn't slow - this is a basic part for a complex query and this timing means that it is not scalable.
See below for the table definition and indexes.
I know it's not server load as I have the same issue on a development server which has very little usage.
+---------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+--------------+------+-----+---------+----------------+
| mainID | int(11) | NO | PRI | NULL | auto_increment |
| otherID1 | int(11) | NO | MUL | NULL | |
| otherID2 | int(11) | NO | MUL | NULL | |
| otherID3 | int(11) | NO | MUL | NULL | |
| keyword | varchar(200) | NO | MUL | NULL | |
| dateStartCol | date | NO | MUL | NULL | |
| timeStartCol | time | NO | MUL | NULL | |
| dateEndCol | date | NO | MUL | NULL | |
| timeEndCol | time | NO | MUL | NULL | |
| statusCode | int(1) | NO | MUL | NULL | |
| uRL | text | NO | | NULL | |
| hostname | varchar(200) | YES | MUL | NULL | |
| IPAddress | varchar(25) | YES | | NULL | |
| cookieVal | varchar(100) | NO | | NULL | |
| keywordVal | varchar(60) | NO | | NULL | |
| dateTimeCol | datetime | NO | MUL | NULL | |
+---------------------------+--------------+------+-----+---------+----------------+
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| tableName | 0 | PRIMARY | 1 | mainID | A | 661990 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID1 | 1 | otherID1 | A | 330995 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID2 | 1 | otherID2 | A | 25 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID3 | 1 | otherID3 | A | 48 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_dateStartCol | 1 | dateStartCol | A | 187 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_timeStartCol | 1 | timeStartCol | A | 73554 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_dateEndCol | 1 | dateEndCol | A | 188 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_timeEndCol | 1 | timeEndCol | A | 73554 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_keyword | 1 | keyword | A | 82748 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_hostname | 1 | hostname | A | 2955 | NULL | NULL | YES | BTREE | |
| tableName | 1 | idx_dateTimeCol | 1 | dateTimeCol | A | 220663 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_statusCode | 1 | statusCode | A | 2 | NULL | NULL | | BTREE | |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
Explain Output:
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | tableName | range | idx_otherID3,idx_dateStartCol | idx_dateStartCol | 3 | NULL | 66875 | 75.00 | Using where |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
If that is really your query (and not a simplified version of same), then this ought to achieve best results:
CREATE INDEX table_ndx on tableName( otherID3, dateStartCol, mainID);
The first index entry means that the first match in the WHERE is very fast; the same also applies with dateStartCol. The third field is very small and does not slow the index appreciably, but allows for the datum you require to be found immediately in the index with no table access at all.
It is important that the keys are in the same index. In the EXPLAIN you posted, each key is in an index of its own, so even if MySQL chooses the best index, the performances will not be optimal. I'd try and use less indexes, for they also have a cost (shameless plug: Can Indices actually decrease SELECT performance? ).
First try to add the right key. It seems like dateStartCol is more selective than otherID3
ALTER TABLE tableName ADD KEY idx_dates(dateStartCol, dateStartCol)
Second - please make sure you select only rows you need by adding LIMIT clause to the SELECT. This will should up the query. Try like this:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31'
LIMIT 10;
Please also make sure that your MySQL tuned up properly. You may want to check key_buffer_size and innodb_buffer_pool_size as described in http://astellar.com/2011/12/why-is-stock-mysql-slow/
If this is a recurrent or important query then create a multiple column index:
CREATE INDEX index_name ON tableName (otherID3, dateStartCol)
Delete the non used indexes as they make table changes more expensive.
BTW you don't need two separate columns for date and time. You can combine then in a datetime or timestamp type. One less column and one less index.
The explain output shows it chose the dateStartCol index so you could try the opposite I suggested above:
CREATE INDEX index_name ON tableName (dateStartCol, otherID3)
Notice that the query's dateStartCol condition will still get 75% of the rows so not much improvement, if any, in using that single index.
How unique is otherID3? If there are not many repeated otherID3 you can hint the engine to use it.