MySQL specific query performance tuning - mysql

I have troubles with MySQL query performance.
Table (InnoDB):
+--------------------+---------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------------------+------+-----+-------------------+-------+
| st_resource_id | varchar(32) | NO | MUL | NULL | |
| st_sub_resource_id | varchar(32) | YES | | NULL | |
| st_title | varchar(500) | YES | | NULL | |
| st_resource_type | varchar(100) | NO | MUL | NULL | |
| st_site_id | tinyint(4) | NO | MUL | NULL | |
| st_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| st_user_id | int(10) unsigned | YES | | NULL | |
| st_full_access | tinyint(1) unsigned | YES | | NULL | |
+--------------------+---------------------+------+-----+-------------------+-------+
Indexes:
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| nr_statistics | 1 | resource_id | 1 | st_resource_id | A | 1546165 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | resource_id | 2 | st_sub_resource_id | A | 1546165 | NULL | NULL | YES | BTREE | |
| nr_statistics | 1 | st_time | 1 | st_time | A | 1546165 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id | 1 | st_site_id | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_resource_type | 1 | st_resource_type | A | 16 | 10 | NULL | | BTREE | |
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
Query:
SELECT st_resource_id AS docId, count(*) AS cnt
FROM nr_statistics
WHERE
st_resource_type = 'document'
AND st_sub_resource_id = 'text'
AND st_time > DATE_SUB(NOW(), INTERVAL 7 DAY)
AND st_site_id = 1
GROUP BY st_resource_id
ORDER BY cnt DESC
LIMIT 0, 5;
Query plan:
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | nr_statistics | index | st_time,st_site_id,st_resource_type | resource_id | 197 | NULL | 1581044 | Using where; Using temporary; Using filesort |
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
Table has ~1,666,383 rows. The query runs extremely slow. In MySQL process list I see this query in "copy to tmp table phase" for a long time (> 1 minute). Query generates heavy I/O load. I can't understand what to do to fix the problem and speed-up query execution.
If the problem is a result of wrong indexes, so what indexes will be right?
UPD. I created new composite index:
| nr_statistics | 1 | st_site_id_2 | 1 | st_site_id | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 2 | st_resource_type | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 3 | st_sub_resource_id | A | 752018 | NULL | NULL | YES | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 4 | st_time | A | 1504037 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 5 | st_resource_id | A | 1504037 | NULL | NULL | | BTREE | |
Now query plan is:
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | nr_statistics | range | st_site_id_2 | st_site_id_2 | 406 | NULL | 21168 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
The query now runs very fast (as 0.0x sec), but I have to force using new index:
SELECT st_resource_id as docId, count( * ) AS Cnt
FROM nr_statistics
USE INDEX (st_site_id_2)
WHERE st_resource_type = 'document'
AND st_sub_resource_id = 'text'
AND st_time > DATE_SUB( NOW( ) , INTERVAL 7 DAY )
AND st_site_id = 1
GROUP BY st_resource_id
ORDER BY cnt DESC
LIMIT 0 , 5;
While the problem is resolved (not beautiful but effective way), I still have some open questions (see comments).

Create a composite index on (st_site_id, st_resource_type, st_sub_resourse_id, st_time, st_resource_id).
However, you will still have temporary and filesort in the plan because you are ordering on COUNT(*) which is not indexable.
If you need to run this query fast and often, you would have to create an aggregate table which would store counts for each site/resource/subresourse/week combination and update it in a trigger.

Did you try creating a composite index on st_resource_type, st_resource_id, st_time and st_site_id? It looks to me like you have several indexes, but most are on a single column, or maybe 2 columns. By having a composite index with more of the columns you need, it may improve the performance.

When doing queries with multiple where clauses the order in which you write them should match the order in which you wrote your query.
In your particular case it would be:
CREATE INDEX stats_index ON nr_statistics (st_resource_type, st_sub_resource_id, st_time, st_site_id);
This should give you a pretty good speed boost.

Related

Query optimization using any method

I need to speed up this query. What can I do?
select i.resp_id as id from int_result i, response_set rs, cx_store_child cbu
where rs.survey_id IN(5550512,5550516,5550521,5550520,5590351,5590384,5679615,5679646,5691634,5699259,5699266,5699270)
and i.q_id IN(52603091,52251250,52250724,52251333,52919541,52920117,54409178,54409806,54625102,54738933,54739117,54739221)
and rs.t >= '2017-08-30 00:00:00' and rs.t <= '2017-09-30 00:00:00'
and i.response_set_id = rs.id and rs.cx_business_unit_id = cbu.child_bu_id
and cbu.business_unit_id = 30850
group by rs.cx_business_unit_id, i.a_id, extract(day from rs.t)
------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| int_result | 0 | PRIMARY | 1 | id | A | 240843099 | NULL | NULL | | BTREE | | |
| int_result | 1 | q_id | 1 | q_id | A | 1442174 | NULL | NULL | | BTREE | | |
| int_result | 1 | a_id | 1 | a_id | A | 20070258 | NULL | NULL | | BTREE | | |
| int_result | 1 | resp_id | 1 | resp_id | A | 120421549 | NULL | NULL | | BTREE | | |
| int_result | 1 | response_set_id | 1 | response_set_id | A | 26760344 | NULL | NULL | | BTREE | | |
| int_result | 1 | survey_id | 1 | survey_id | A | 503855 | NULL | NULL | YES | BTREE | | |
| int_result | 1 | survey_id_2 | 1 | survey_id | A | 1459655 | NULL | NULL | YES | BTREE | | |
| int_result | 1 | survey_id_2 | 2 | q_id | A | 2736853 | NULL | NULL | | BTREE | | |
+------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+------
--+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+----------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| response_set | 0 | PRIMARY | 1 | id | A | 14307454 | NULL | NULL | | BTREE | | |
| response_set | 1 | survey_id | 1 | survey_id | A | 223553 | NULL | NULL | | BTREE | | |
| response_set | 1 | id | 1 | id | A | 14307454 | NULL | NULL | | BTREE | | |
| response_set | 1 | external_id | 1 | external_id | A | 2921 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | panel_member_id | 1 | panel_member_id | A | 357686 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | email_group | 1 | email_group | A | 21259 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | survey_timestamp_idx | 1 | survey_id | A | 433559 | NULL | NULL | | BTREE | | |
| response_set | 1 | survey_timestamp_idx | 2 | t | A | 14307454 | NULL | NULL | YES | BTREE | | |
| response_set | 1 | bu_id | 1 | cx_business_unit_id | A | 2246 | NULL | NULL | YES | BTREE | | |
----------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cx_store_child | 0 | PRIMARY | 1 | id | A | 13667 | NULL | NULL | | BTREE | | |
| cx_store_child | 0 | bu_child_ref | 1 | business_unit_id | A | 13667 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 0 | bu_child_ref | 2 | child_bu_id | A | 13667 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 1 | cx_feedback_id | 1 | cx_feedback_id | A | 506 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 1 | business_unit_id | 1 | business_unit_id | A | 13667 | NULL | NULL | YES | BTREE | | |
| cx_store_child | 1 | child_bu_id | 1 | child_bu_id | A | 13667 | NULL | NULL | YES | BTREE | | |
+----------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+
You seem to have an index on response_set(survey_id, t).
Try creating a so-called compound covering index on
response_set(t, survey_id, cx_business_unit_id)
This may help optimize the part of your query using that table. Why? Your query calls for a range scan on t, and a column used in a range scan must be first in its compound index.
Similarly, an index on int_result (q_id, resp_id, response_set_id) may help extract the data you need from that table.
Some notes:
It's hard to tell what your query does. Maybe some explanation will help you get better results here?
and rs.t >= '2017-08-30 00:00:00' and rs.t <= '2017-09-30 00:00:00' is probably incorrect as to the end of the time range. It probably contains an off-by-one error. Do you want < in place of <= ? What you have given includes the records with timestamps precisely at midnight on 30-Sep-2017, but no records on that day after midnight.
You have one index on int_result(survey_id, q_id) and another on int_result(survey_id). The latter index is entirely redundant with the former and you can drop it.
You seem to have lots of single-column indexes. Pro tip: don't add such indexes unless you know you need them. They rarely help speed up arbitrary queries and always slow down insertions and updates. Why might you need them? If you have a query you know needs them, or you need to enforce uniqueness. Drop the indexes you don't need.
Use 21st-century JOIN syntax instead of the old-timey comma-join syntax as follows. It's easier to read.
from int_result i
join response_set rs on i.response_set_id = rs.id
join cx_store_child cbu on rs.cx_business_unit_id = cbu.child_bu_id
Read this. You're maintaining a large database and it's worth your time to learn a lot about indexing. http://use-the-index-luke.com/
There are many ways that the Optimizer may attempt to perform the query. The following indexes give it some flexibility to find the optimal order of hitting the tables:
cbu: INDEX(business_unit_id, child_bu_id)
rs: INDEX(t, cx_business_unit_id, survey_id)
rs: INDEX(survey_id, t, cx_business_unit_id)
rs: INDEX(cx_business_unit_id, survey_id, t)
i: INDEX(response_set_id, q_id)
i: INDEX(q_id, response_set_id)
I arranged for rs and cbu to have "covering" indexes in all cases; this helps some.
(Yes, you should change to JOIN...ON as O. Jones suggests. And the rest of his suggestions.)
Before further discussion, please provide SHOW CREATE TABLE; the could be datatype issues, too.
A PRIMARY KEY is a UNIQUE key is an INDEX -- so INDEX(id) in rs is redundant.

Why does MySQL not use optimal indexes

I'm trying to optimize my query, however, MySQL seems to be utilizing non-optimal indexes on the query and I can't seem to figure out what is wrong. My query is as follows:
SELECT SQL_CALC_FOUND_ROWS deal_ID AS ID,dealTitle AS dealSaving,
storeName AS title,deal_URL AS dealURL,dealDisclaimer,
dealType, providerName,providerLogo AS providerIMG,createDate,
latitude AS lat,longitude AS lng,'local' AS type,businessType,
address1,city,dealOriginalPrice,NULL AS dealDiscountPercent,
dealPrice,scoringBase, smallImage AS smallimage,largeImage AS image,
storeURL AS storeAlias,
exp(-power(greatest(0,
abs(69.0*DEGREES(ACOS(0.82835377099147 *
COS(RADIANS(latitude)) * COS(RADIANS(-118.4-longitude)) +
0.56020534635454*SIN(RADIANS(latitude)))))-2),
2)/(5.7707801635559)) *
scoringBase * IF(submit_ID IN (18381),
IF(businessType = 1,1.3,1.2),IF(submit_ID IN (54727),1.19, 1)
) AS distance
FROM local_deals
WHERE latitude BETWEEN 33.345362318841 AND 34.794637681159
AND longitude BETWEEN -119.61862872928 AND -117.18137127072
AND state = 'CA'
AND country = 'US'
ORDER BY distance DESC
LIMIT 48 OFFSET 0;
Listing the indexes on the table reveals:
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| local_deals | 0 | PRIMARY | 1 | id | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 0 | unique_deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | store_ID | 1 | store_ID | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeOnline_ID | 1 | storeOnline_ID | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeChain_ID | 1 | storeChain_ID | A | 117 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | userProvider_ID | 1 | userProvider_ID | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | expirationDate | 1 | expirationDate | A | 3127 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | createDate | 1 | createDate | A | 96946 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | city | 1 | city | A | 17626 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | state | 1 | state | A | 138 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | zip | 1 | zip | A | 38778 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | country | 1 | country | A | 39 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | latitude | 1 | latitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | longitude | 1 | longitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | eventDate | 1 | eventDate | A | 4215 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | isNowDeal | 1 | isNowDeal | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | businessType | 1 | businessType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | dealType | 1 | dealType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | submit_ID | 1 | submit_ID | A | 5 | NULL | NULL | YES | BTREE | | |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Running explain extended reveals:
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | ref | state,country,latitude,longitude | state | 35 | const | 52472 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
There are around 200k rows in the table. What is strange is that it is ignoring the latitude and longitude indexes as those should filter the table more. Running a query where I remove the "state" and "country" where commands reveals the following explain:
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | range | latitude,longitude | longitude | 5 | NULL | 30662 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
This shows that the longitude index would better filter the table to 30,662 rows. Am I missing something here? How can I get MySQL to use all queries. Note that the table is InnoDB and I'm using MySQL 5.5.
The best index for your query is a composite index on (country, state, latitude, longitude) (country and state could be swapped). MySQL has good documentation on multi-column indexes, which is here.
Basically, latitude and longitude are not particularly selective individually. Unfortunately, the standard B-tree index only supports one inequality, and your query has two.
Actually, if you want GIS processing, then you should use a spatial extension to MySQL.
Depending on the size of your table, Gordon's suggested index may be "good enough". If you need to get even more performance, you need to go to a 2D partitioning technique, wherein you partition on latitude and arrange for the InnoDB PRIMARY KEY to begin with longitude. More details, and sample code, are available in my article.
A generic technique for problems like this is to build a subquery with these properties:
It returns no more than LIMIT rows; and those are all that you need.
There is a "covering index" for the columns involved, plus the PRIMARY KEY.
You are using InnoDB.
Something like
SELECT b. ..., a.distance
FROM local_deals b
JOIN (
SELECT id,
(...) AS distance,
FROM local_deals
WHERE latitude BETWEEN 33.34536 AND 34.79464
AND longitude BETWEEN -119.61863 AND -117.18137
AND state = 'CA'
AND country = 'US'
ORDER BY distance ASC
LIMIT 48 OFFSET 0
) AS a ON b.id = a.id
ORDER BY a.distance;
INDEX(country, state, latitude, longitude, id) -- `id` is the PK
-- country and state first (because of '='); id last.
Why this helps...
The index is "covering", so the lengthy scan (of a lot more than 48 rows) is done entirely in the index's BTree. This cuts down on I/O for huge tables.
All the other fields (b.*) are not hauled around through tmp tables, etc. Only 48 are sets of those fields are dealt with.
The 48 lookups by id are especially efficient in InnoDB due to the "clustered PK".
When working with "huge" tables, where I/O dominates, this technique can be counted thus:
1, or a small number of, blocks in the index are needed for the subquery. Note that the desired records are consecutive, or nearly so. (OK, if there are 30K to look through, it could be more than 100 blocks; hence my comment about shrinking the bounding box to start with.)
Then 48 (LIMIT) random fetches via id get the 48 rows.
Without the subquery, the bulky rows need to be fetched. And, depending on the index used, that could be up to 30K blocks fetched. That's orders of magnitude slower.
Also, 48 rows versus 30K rows will be written to a tmp table for sorting (ORDER BY).

Simple MySQL query with performance issues

I have the following simple MySQL query:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31';
When I run this it takes 0.29 seconds to bring back 36074 results. When I increase my date period to bring back more results (65703) it runs in 0.56. When I run other similar SQL queries on the same server but on different tables (some tables are larger) the results come back in approximately 0.01 seconds.
Although 0.29 isn't slow - this is a basic part for a complex query and this timing means that it is not scalable.
See below for the table definition and indexes.
I know it's not server load as I have the same issue on a development server which has very little usage.
+---------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+--------------+------+-----+---------+----------------+
| mainID | int(11) | NO | PRI | NULL | auto_increment |
| otherID1 | int(11) | NO | MUL | NULL | |
| otherID2 | int(11) | NO | MUL | NULL | |
| otherID3 | int(11) | NO | MUL | NULL | |
| keyword | varchar(200) | NO | MUL | NULL | |
| dateStartCol | date | NO | MUL | NULL | |
| timeStartCol | time | NO | MUL | NULL | |
| dateEndCol | date | NO | MUL | NULL | |
| timeEndCol | time | NO | MUL | NULL | |
| statusCode | int(1) | NO | MUL | NULL | |
| uRL | text | NO | | NULL | |
| hostname | varchar(200) | YES | MUL | NULL | |
| IPAddress | varchar(25) | YES | | NULL | |
| cookieVal | varchar(100) | NO | | NULL | |
| keywordVal | varchar(60) | NO | | NULL | |
| dateTimeCol | datetime | NO | MUL | NULL | |
+---------------------------+--------------+------+-----+---------+----------------+
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| tableName | 0 | PRIMARY | 1 | mainID | A | 661990 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID1 | 1 | otherID1 | A | 330995 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID2 | 1 | otherID2 | A | 25 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID3 | 1 | otherID3 | A | 48 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_dateStartCol | 1 | dateStartCol | A | 187 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_timeStartCol | 1 | timeStartCol | A | 73554 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_dateEndCol | 1 | dateEndCol | A | 188 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_timeEndCol | 1 | timeEndCol | A | 73554 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_keyword | 1 | keyword | A | 82748 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_hostname | 1 | hostname | A | 2955 | NULL | NULL | YES | BTREE | |
| tableName | 1 | idx_dateTimeCol | 1 | dateTimeCol | A | 220663 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_statusCode | 1 | statusCode | A | 2 | NULL | NULL | | BTREE | |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
Explain Output:
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | tableName | range | idx_otherID3,idx_dateStartCol | idx_dateStartCol | 3 | NULL | 66875 | 75.00 | Using where |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
If that is really your query (and not a simplified version of same), then this ought to achieve best results:
CREATE INDEX table_ndx on tableName( otherID3, dateStartCol, mainID);
The first index entry means that the first match in the WHERE is very fast; the same also applies with dateStartCol. The third field is very small and does not slow the index appreciably, but allows for the datum you require to be found immediately in the index with no table access at all.
It is important that the keys are in the same index. In the EXPLAIN you posted, each key is in an index of its own, so even if MySQL chooses the best index, the performances will not be optimal. I'd try and use less indexes, for they also have a cost (shameless plug: Can Indices actually decrease SELECT performance? ).
First try to add the right key. It seems like dateStartCol is more selective than otherID3
ALTER TABLE tableName ADD KEY idx_dates(dateStartCol, dateStartCol)
Second - please make sure you select only rows you need by adding LIMIT clause to the SELECT. This will should up the query. Try like this:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31'
LIMIT 10;
Please also make sure that your MySQL tuned up properly. You may want to check key_buffer_size and innodb_buffer_pool_size as described in http://astellar.com/2011/12/why-is-stock-mysql-slow/
If this is a recurrent or important query then create a multiple column index:
CREATE INDEX index_name ON tableName (otherID3, dateStartCol)
Delete the non used indexes as they make table changes more expensive.
BTW you don't need two separate columns for date and time. You can combine then in a datetime or timestamp type. One less column and one less index.
The explain output shows it chose the dateStartCol index so you could try the opposite I suggested above:
CREATE INDEX index_name ON tableName (dateStartCol, otherID3)
Notice that the query's dateStartCol condition will still get 75% of the rows so not much improvement, if any, in using that single index.
How unique is otherID3? If there are not many repeated otherID3 you can hint the engine to use it.

How can I speed up a MySQL query with WHERE clauses on two columns?

I am trying to speed up a query on a large table with WHERE clauses on two columns, as far as I can, MySQL is only using the ALERT_ID column.
Is there a way to rewrite this query using both indices?
SHOW_INDEX and EXPLAIN output is below.
show index from alert_hit;
+-----------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| alert_hit | 0 | PRIMARY | 1 | id | A | 15181402 | NULL | NULL | | BTREE | | |
| alert_hit | 1 | alert_id | 1 | alert_id | A | 20 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | timestamp | 1 | timestamp | A | 446511 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | data_source_id | 1 | data_source_id | A | 20 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | filter_syndicated | 1 | filter_syndicated | A | 20 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | unique_id | 1 | unique_id | A | 5060467 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | date_created | 1 | date_created | A | 281137 | NULL | NULL | | BTREE | | |
| alert_hit | 1 | language | 1 | language | A | 20 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | region | 1 | region | A | 42406 | NULL | NULL | YES | BTREE | | |
| alert_hit | 1 | market_rank | 1 | market_rank | A | 20 | NULL | NULL | YES | BTREE | | |
+-----------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
explain select count(id) as history FROM alert_hit force index(alert_id, timestamp) where alert_id in (9045,9046,9047,9048,9049,9050,9051,9052,9330,9332) AND timestamp between DATE_SUB( NOW(), INTERVAL 1*2 day) and DATE_SUB( NOW(), INTERVAL 1 day);
+----+-------------+-----------+-------+--------------------+----------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+--------------------+----------+---------+------+-------+-------------+
| 1 | SIMPLE | alert_hit | range | alert_id,timestamp | alert_id | 5 | NULL | 99578 | Using where |
+----+-------------+-----------+-------+--------------------+----------+---------+------+-------+-------------+
You need to have one index on both fields
ALTER TABLE alert_hit ADD INDEX `IDX-alert_id-timestamp` (`alert_id`, `timestamp`);
Also MySQL will use the multi column index up to the first field for which there is a range condition in the WHERE clause, so in this case order matters and timestamp should be last in the index.
As suggested by #spencer7593 selecting COUNT(1) instead of count(id) might also be better.

Index IN and a range

I need to find the best index for this query:
SELECT c.id id, type
FROM Content c USE INDEX (type_proc_last_cat)
LEFT JOIN Battles b ON c.id = b.id
WHERE type = 1
AND processing_status = 1
AND category IN (13, 19)
AND status = 4
ORDER BY last_change DESC
LIMIT 100";
The tables look like this:
mysql> describe Content;
+-------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| type | tinyint(3) unsigned | NO | MUL | NULL | |
| category | bigint(20) unsigned | NO | | NULL | |
| processing_status | tinyint(3) unsigned | NO | | NULL | |
| last_change | int(10) unsigned | NO | | NULL | |
+-------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 1 | type | A | 4 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 2 | processing_status | A | 20 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 3 | last_change | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 4 | category | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> describe Battles;
+---------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| status | tinyint(4) unsigned | NO | | NULL | |
| status_last_changed | int(11) unsigned | NO | | NULL | |
+---------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Battles;
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Battles | 0 | PRIMARY | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 2 | status | A | 1215 | NULL | NULL | | BTREE | |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
And I get output like this:
mysql> explain
-> SELECT c.id id, type
-> FROM Content c USE INDEX (type_proc_last_cat)
-> LEFT JOIN Battles b USE INDEX (id_status) ON c.id = b.id
-> WHERE type = 1
-> AND processing_status = 1
-> AND category IN (13, 19)
-> AND status = 4
-> ORDER BY last_change DESC
-> LIMIT 100;
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| 1 | SIMPLE | c | ref | type_proc_last_cat | type_proc_last_cat | 2 | const,const | 1352 | Using where; Using index |
| 1 | SIMPLE | b | eq_ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
The trouble is the rows count for the Content table. It appears MySQL is unable to effectively use both last_change and category in the type_proc_last_cat index. If I switch the order of last_change and category, fewer rows are selected but it results in a filesort for the ORDER BY, meaning it pulls all the matching rows from the database. That is worse, since there are 100,000+ rows in both tables.
Tables are both InnoDB, so keep in mind that the PRIMARY key is appended to every other index. So the index the index type_proc_last_cat above behaves likes it's on (type, processing_status, last_change, category, id). I am aware I could change the PRIMARY key for Battles to (id, status) and drop the id_status index (and I may just do that).
Edit: Any value for type, category, processing_status, and status is less than 20% of the total values. last_change and status_last_change are unix timestamps.
Edit: If I use a different index with category and last_change in reverse order, I get this:
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 1 | type | A | 6 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 2 | processing_status | A | 26 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 3 | category | A | 228 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 4 | last_change | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> explain SELECT c.id id, type FROM Content c USE INDEX (type_proc_cat_last) LEFT JOIN Battles b
USE INDEX (id_status) ON c.id = b.id WHERE type = 1 AND processing_status = 1 AND category IN (13, 19) AND status = 4 ORDER BY last_change DESC LIMIT 100;
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| 1 | SIMPLE | c | range | type_proc_cat_last | type_proc_cat_last | 10 | NULL | 165 | Using where; Using index; Using filesort |
| 1 | SIMPLE | b | ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
The filesort worries me as it tells me MySQL pulls all the matching rows first, before sorting. This will be a big problem when there are 100,000+.
rows field in EXPLAIN doesn't reflect the number of rows that actually were read. It reflects the number of rows that possible will be affected. Also it doesn't rely on LIMIT, because LIMIT is applied after the plan has been calculated.
So you don't need to worry about it.
Also I would suggest you to swap last_change and category in type_proc_last_cat so mysql can try to use the last index part (last_change) for sorting purposes.