Slow query grouping on blob column - mysql

I'm using the following query to extract the frequent short values from a mediumblob column :
select bytes, count(*) as n
from pr_value
where bytes is not null && length(bytes)<11 and variable_id=5783
group by bytes order by n desc limit 10;
The problem I have is that this query takes too much time (about 10 seconds with less than 1 million records) :
mysql> select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+-------+----+
| bytes | n |
+-------+----+
| 32 | 21 |
| 27 | 20 |
| 52 | 20 |
| 23 | 19 |
| 25 | 19 |
| 26 | 19 |
| 28 | 19 |
| 29 | 19 |
| 30 | 19 |
| 31 | 19 |
+-------+----+
The table is as follows (unrelated columns not shown) :
mysql> describe pr_value;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| product_id | int(11) | NO | PRI | NULL | |
| variable_id | int(11) | NO | PRI | NULL | |
| author_id | int(11) | NO | PRI | NULL | |
| bytes | mediumblob | YES | MUL | NULL | |
+-------------+---------------+------+-----+---------+-------+
The type is mediumblob because most values are big. Less than 10% are short as the ones I'm looking for with this specific query.
I have the following indexes :
mysql> show index from pr_value;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pr_value | 0 | PRIMARY | 1 | product_id | A | 8961 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 3 | author_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 1 | bytes | 1 | bytes | A | 842402 | 10 | NULL | YES | BTREE | | |
| pr_value | 1 | bytes | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
MySQL explains my query like this :
mysql> explain select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | pr_value | range | bytes | bytes | 13 | NULL | 421201 | Using where; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
Note that the condition on the length of the bytes column can be removed without changing the duration.
What can I do to make this query fast ?
Of course i'd prefer not to have to add columns.

your index on (bytes, variable_id) is not very smart. If you always have a variable_id clause in your queries you should add index with variable_id first :
(variable_id, bytes)
It depend on how discriminant variable_id is. But it sould help.
Another tip is to add a new indexed column with the result of "length(bytes)<11" :
update pr_value set small = length(bytes)<11;
Add a new index with (small,variable_id).

Why are you GROUP BY'ing the blob column? I'd imagine that's the bottleneck as then the Query actually does a compare against all blob columns to each other. Is it because you want unique values for the BLOB? I think the DISTINCT keyword might perform better than the GROUP BY.

Related

MySQL index_merge is causing the query to run 10x slower

I have a table with the following properties :
+-----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | <null> | auto_increment |
| c2 | varchar(255) | YES | MUL | <null> | |
| c3 | int(11) | YES | | <null> | |
| c4 | varchar(255) | YES | | <null> | |
| c5 | varchar(255) | YES | | <null> | |
| c6 | int(11) | YES | MUL | <null> | |
| c7 | int(11) | YES | | <null> | |
| c8 | int(11) | YES | | <null> | |
| c9 | datetime | YES | | <null> | |
| c10 | datetime | YES | | <null> | |
| c11 | char(40) | YES | UNI | <null> | |
| c12 | tinyint(1) | NO | MUL | 1 | |
| c13 | text | YES | | <null> | |
| c14 | int(11) | YES | MUL | <null> | |
| c15 | varchar(64) | YES | MUL | <null> | |
+-----------------------+--------------+------+-----+---------+----------------+
show index from table_one; shows the following output :
+-------------------+------------+--------------------------------------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+--------------------------------------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table_one | 0 | PRIMARY | 1 | id | A | 1621972 | NULL | NULL | | BTREE | | |
| table_one | 0 | c11 | 1 | c11 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 1 | c2 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 2 | c6 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 3 | c8 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 4 | c14 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 1 | c12 | 1 | c12 | A | 1 | NULL | NULL | | BTREE | | |
| table_one | 1 | c6 | 1 | c6 | A | 20794 | NULL | NULL | YES | BTREE | | |
| table_one | 1 | c14 | 1 | c14 | A | 577 | NULL | NULL | YES | BTREE | | |
| table_one | 1 | c15 | 1 | c15 | A | 5 | NULL | NULL | YES | BTREE | | |
+-------------------+------------+--------------------------------------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Now, when I run the following query, it takes around 5.8 seconds average :
select * from table_one
where c6 = 12345 and c14 = 12
limit 10 offset 0;
When I run explain on the above query, it says it has used index_merge:
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+------+-------------------------+
| 1 | SIMPLE | table_one | index_merge | ....................... | c14, c6 | 5,5 | NULL | 9 | Using intersect(c14,c6);|
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+------+-------------------------+
But if I force the table to use index on c6 only, it returns results in 0.6 seconds average :
select * from table_one force index(c6) where c6 = 12345 and c14 = 12 limit 10 offset 0;
Why is MySQL using index_merge on its own and making it slow? I am aware that I don't have composite index on c6, c14, but they exist individually.
Also, the explain query for the force index shows more count in the rows accessed to perform the query, but is still 10x faster.
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| 1 | SIMPLE | table_one | ref | ....................... | c6 | 5 | const | 22388 | Using where; |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
This is causing our production to go down when someone hits the APIs at a high rate. MySQL just doesn't return any results for 59 seconds and the query gets timed out.
Plus, I can't really add composite indexes or change schema without downtime since we already have 2 million entries in it.
The current temporary fix is to add the force index(c6) to the query, but I am not really sure how scalable it would be or if we might end up having problems later on.
EDIT 1
Could the slowness be because of the order in which the index_merge is done?
More information regarding c6 and c14: Consider c6 as countries and c14 as states.
EDIT 2 : 2020-06-15 07:52:35 UTC :
I tried running the query by forcing the index of c14 and it turns out to be slower by roughly 3x:
select * from table_one force index(c14) where c6 = 12345 and c14 = 12 limit 10 offset 0;
The query took 2.1 seconds.
And the explain query gives the following output:
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| 1 | SIMPLE | table_one | ref | ....................... | c14 | 5 | const | 730 | Using where; |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
The rows to be accessed by the query is 30x less than when the index is forced on c6. i.e. here the rows are 730, whereas with the previous query, its 22k. What factors can make this index slow even with lesser rows to access?
Some more information if it helps in any manner :
mysql> select count(*) from table_one where c14 is null;
+----------+
| count(*) |
+----------+
| 7490 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from table_one;
+----------+
| count(*) |
+----------+
| 1936278 |
+----------+
1 row in set (1.68 sec)
mysql> select count(*) from table_one where c6 is null;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
Consider
use (dbname);
ALTER TABLE table_one ADD INDEX table_one_c6_and_c14 (c6,c14);
remove the force request in your query.
And let us know how long it took to create the multi column index, please.
And the time to complete your query, now that an appropriate index is available.
select *
from table_one force index(c14)
where c6 = 12345
and c14 = 12
limit 10 offset 0;
Discussion:
You forced it to use INDEX(c14)
It will reach into that index at the first entry for c14 = 12.
Then it will scan, quite efficiently, across the rows with "12".
Each will be checked for c6 = 12345. This is not as efficient as if there were INDEX(c14, c6).
It might find 10 (cf limit) such rows quickly, or it might have to step over thousands of rows with the 'wrong' value of c6.
That is, the query time (for this query) depends a lot on the distribution of the data.
With INDEX(c14, c6), only 10 index rows need be touched -- much faster, and (relatively) consistent speed.
INDEX(c6, c14) would as fast as the same as INDEX(c14, c6).
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Why does MySQL not use optimal indexes

I'm trying to optimize my query, however, MySQL seems to be utilizing non-optimal indexes on the query and I can't seem to figure out what is wrong. My query is as follows:
SELECT SQL_CALC_FOUND_ROWS deal_ID AS ID,dealTitle AS dealSaving,
storeName AS title,deal_URL AS dealURL,dealDisclaimer,
dealType, providerName,providerLogo AS providerIMG,createDate,
latitude AS lat,longitude AS lng,'local' AS type,businessType,
address1,city,dealOriginalPrice,NULL AS dealDiscountPercent,
dealPrice,scoringBase, smallImage AS smallimage,largeImage AS image,
storeURL AS storeAlias,
exp(-power(greatest(0,
abs(69.0*DEGREES(ACOS(0.82835377099147 *
COS(RADIANS(latitude)) * COS(RADIANS(-118.4-longitude)) +
0.56020534635454*SIN(RADIANS(latitude)))))-2),
2)/(5.7707801635559)) *
scoringBase * IF(submit_ID IN (18381),
IF(businessType = 1,1.3,1.2),IF(submit_ID IN (54727),1.19, 1)
) AS distance
FROM local_deals
WHERE latitude BETWEEN 33.345362318841 AND 34.794637681159
AND longitude BETWEEN -119.61862872928 AND -117.18137127072
AND state = 'CA'
AND country = 'US'
ORDER BY distance DESC
LIMIT 48 OFFSET 0;
Listing the indexes on the table reveals:
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| local_deals | 0 | PRIMARY | 1 | id | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 0 | unique_deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | store_ID | 1 | store_ID | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeOnline_ID | 1 | storeOnline_ID | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeChain_ID | 1 | storeChain_ID | A | 117 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | userProvider_ID | 1 | userProvider_ID | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | expirationDate | 1 | expirationDate | A | 3127 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | createDate | 1 | createDate | A | 96946 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | city | 1 | city | A | 17626 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | state | 1 | state | A | 138 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | zip | 1 | zip | A | 38778 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | country | 1 | country | A | 39 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | latitude | 1 | latitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | longitude | 1 | longitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | eventDate | 1 | eventDate | A | 4215 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | isNowDeal | 1 | isNowDeal | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | businessType | 1 | businessType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | dealType | 1 | dealType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | submit_ID | 1 | submit_ID | A | 5 | NULL | NULL | YES | BTREE | | |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Running explain extended reveals:
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | ref | state,country,latitude,longitude | state | 35 | const | 52472 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
There are around 200k rows in the table. What is strange is that it is ignoring the latitude and longitude indexes as those should filter the table more. Running a query where I remove the "state" and "country" where commands reveals the following explain:
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | range | latitude,longitude | longitude | 5 | NULL | 30662 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
This shows that the longitude index would better filter the table to 30,662 rows. Am I missing something here? How can I get MySQL to use all queries. Note that the table is InnoDB and I'm using MySQL 5.5.
The best index for your query is a composite index on (country, state, latitude, longitude) (country and state could be swapped). MySQL has good documentation on multi-column indexes, which is here.
Basically, latitude and longitude are not particularly selective individually. Unfortunately, the standard B-tree index only supports one inequality, and your query has two.
Actually, if you want GIS processing, then you should use a spatial extension to MySQL.
Depending on the size of your table, Gordon's suggested index may be "good enough". If you need to get even more performance, you need to go to a 2D partitioning technique, wherein you partition on latitude and arrange for the InnoDB PRIMARY KEY to begin with longitude. More details, and sample code, are available in my article.
A generic technique for problems like this is to build a subquery with these properties:
It returns no more than LIMIT rows; and those are all that you need.
There is a "covering index" for the columns involved, plus the PRIMARY KEY.
You are using InnoDB.
Something like
SELECT b. ..., a.distance
FROM local_deals b
JOIN (
SELECT id,
(...) AS distance,
FROM local_deals
WHERE latitude BETWEEN 33.34536 AND 34.79464
AND longitude BETWEEN -119.61863 AND -117.18137
AND state = 'CA'
AND country = 'US'
ORDER BY distance ASC
LIMIT 48 OFFSET 0
) AS a ON b.id = a.id
ORDER BY a.distance;
INDEX(country, state, latitude, longitude, id) -- `id` is the PK
-- country and state first (because of '='); id last.
Why this helps...
The index is "covering", so the lengthy scan (of a lot more than 48 rows) is done entirely in the index's BTree. This cuts down on I/O for huge tables.
All the other fields (b.*) are not hauled around through tmp tables, etc. Only 48 are sets of those fields are dealt with.
The 48 lookups by id are especially efficient in InnoDB due to the "clustered PK".
When working with "huge" tables, where I/O dominates, this technique can be counted thus:
1, or a small number of, blocks in the index are needed for the subquery. Note that the desired records are consecutive, or nearly so. (OK, if there are 30K to look through, it could be more than 100 blocks; hence my comment about shrinking the bounding box to start with.)
Then 48 (LIMIT) random fetches via id get the 48 rows.
Without the subquery, the bulky rows need to be fetched. And, depending on the index used, that could be up to 30K blocks fetched. That's orders of magnitude slower.
Also, 48 rows versus 30K rows will be written to a tmp table for sorting (ORDER BY).

Simple MySQL query with performance issues

I have the following simple MySQL query:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31';
When I run this it takes 0.29 seconds to bring back 36074 results. When I increase my date period to bring back more results (65703) it runs in 0.56. When I run other similar SQL queries on the same server but on different tables (some tables are larger) the results come back in approximately 0.01 seconds.
Although 0.29 isn't slow - this is a basic part for a complex query and this timing means that it is not scalable.
See below for the table definition and indexes.
I know it's not server load as I have the same issue on a development server which has very little usage.
+---------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+--------------+------+-----+---------+----------------+
| mainID | int(11) | NO | PRI | NULL | auto_increment |
| otherID1 | int(11) | NO | MUL | NULL | |
| otherID2 | int(11) | NO | MUL | NULL | |
| otherID3 | int(11) | NO | MUL | NULL | |
| keyword | varchar(200) | NO | MUL | NULL | |
| dateStartCol | date | NO | MUL | NULL | |
| timeStartCol | time | NO | MUL | NULL | |
| dateEndCol | date | NO | MUL | NULL | |
| timeEndCol | time | NO | MUL | NULL | |
| statusCode | int(1) | NO | MUL | NULL | |
| uRL | text | NO | | NULL | |
| hostname | varchar(200) | YES | MUL | NULL | |
| IPAddress | varchar(25) | YES | | NULL | |
| cookieVal | varchar(100) | NO | | NULL | |
| keywordVal | varchar(60) | NO | | NULL | |
| dateTimeCol | datetime | NO | MUL | NULL | |
+---------------------------+--------------+------+-----+---------+----------------+
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| tableName | 0 | PRIMARY | 1 | mainID | A | 661990 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID1 | 1 | otherID1 | A | 330995 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID2 | 1 | otherID2 | A | 25 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID3 | 1 | otherID3 | A | 48 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_dateStartCol | 1 | dateStartCol | A | 187 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_timeStartCol | 1 | timeStartCol | A | 73554 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_dateEndCol | 1 | dateEndCol | A | 188 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_timeEndCol | 1 | timeEndCol | A | 73554 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_keyword | 1 | keyword | A | 82748 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_hostname | 1 | hostname | A | 2955 | NULL | NULL | YES | BTREE | |
| tableName | 1 | idx_dateTimeCol | 1 | dateTimeCol | A | 220663 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_statusCode | 1 | statusCode | A | 2 | NULL | NULL | | BTREE | |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
Explain Output:
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | tableName | range | idx_otherID3,idx_dateStartCol | idx_dateStartCol | 3 | NULL | 66875 | 75.00 | Using where |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
If that is really your query (and not a simplified version of same), then this ought to achieve best results:
CREATE INDEX table_ndx on tableName( otherID3, dateStartCol, mainID);
The first index entry means that the first match in the WHERE is very fast; the same also applies with dateStartCol. The third field is very small and does not slow the index appreciably, but allows for the datum you require to be found immediately in the index with no table access at all.
It is important that the keys are in the same index. In the EXPLAIN you posted, each key is in an index of its own, so even if MySQL chooses the best index, the performances will not be optimal. I'd try and use less indexes, for they also have a cost (shameless plug: Can Indices actually decrease SELECT performance? ).
First try to add the right key. It seems like dateStartCol is more selective than otherID3
ALTER TABLE tableName ADD KEY idx_dates(dateStartCol, dateStartCol)
Second - please make sure you select only rows you need by adding LIMIT clause to the SELECT. This will should up the query. Try like this:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31'
LIMIT 10;
Please also make sure that your MySQL tuned up properly. You may want to check key_buffer_size and innodb_buffer_pool_size as described in http://astellar.com/2011/12/why-is-stock-mysql-slow/
If this is a recurrent or important query then create a multiple column index:
CREATE INDEX index_name ON tableName (otherID3, dateStartCol)
Delete the non used indexes as they make table changes more expensive.
BTW you don't need two separate columns for date and time. You can combine then in a datetime or timestamp type. One less column and one less index.
The explain output shows it chose the dateStartCol index so you could try the opposite I suggested above:
CREATE INDEX index_name ON tableName (dateStartCol, otherID3)
Notice that the query's dateStartCol condition will still get 75% of the rows so not much improvement, if any, in using that single index.
How unique is otherID3? If there are not many repeated otherID3 you can hint the engine to use it.

MySQL specific query performance tuning

I have troubles with MySQL query performance.
Table (InnoDB):
+--------------------+---------------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------------------+------+-----+-------------------+-------+
| st_resource_id | varchar(32) | NO | MUL | NULL | |
| st_sub_resource_id | varchar(32) | YES | | NULL | |
| st_title | varchar(500) | YES | | NULL | |
| st_resource_type | varchar(100) | NO | MUL | NULL | |
| st_site_id | tinyint(4) | NO | MUL | NULL | |
| st_time | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| st_user_id | int(10) unsigned | YES | | NULL | |
| st_full_access | tinyint(1) unsigned | YES | | NULL | |
+--------------------+---------------------+------+-----+-------------------+-------+
Indexes:
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| nr_statistics | 1 | resource_id | 1 | st_resource_id | A | 1546165 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | resource_id | 2 | st_sub_resource_id | A | 1546165 | NULL | NULL | YES | BTREE | |
| nr_statistics | 1 | st_time | 1 | st_time | A | 1546165 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id | 1 | st_site_id | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_resource_type | 1 | st_resource_type | A | 16 | 10 | NULL | | BTREE | |
+---------------+------------+------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
Query:
SELECT st_resource_id AS docId, count(*) AS cnt
FROM nr_statistics
WHERE
st_resource_type = 'document'
AND st_sub_resource_id = 'text'
AND st_time > DATE_SUB(NOW(), INTERVAL 7 DAY)
AND st_site_id = 1
GROUP BY st_resource_id
ORDER BY cnt DESC
LIMIT 0, 5;
Query plan:
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | nr_statistics | index | st_time,st_site_id,st_resource_type | resource_id | 197 | NULL | 1581044 | Using where; Using temporary; Using filesort |
+----+-------------+---------------+-------+-------------------------------------+-------------+---------+------+---------+----------------------------------------------+
Table has ~1,666,383 rows. The query runs extremely slow. In MySQL process list I see this query in "copy to tmp table phase" for a long time (> 1 minute). Query generates heavy I/O load. I can't understand what to do to fix the problem and speed-up query execution.
If the problem is a result of wrong indexes, so what indexes will be right?
UPD. I created new composite index:
| nr_statistics | 1 | st_site_id_2 | 1 | st_site_id | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 2 | st_resource_type | A | 16 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 3 | st_sub_resource_id | A | 752018 | NULL | NULL | YES | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 4 | st_time | A | 1504037 | NULL | NULL | | BTREE | |
| nr_statistics | 1 | st_site_id_2 | 5 | st_resource_id | A | 1504037 | NULL | NULL | | BTREE | |
Now query plan is:
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
| 1 | SIMPLE | nr_statistics | range | st_site_id_2 | st_site_id_2 | 406 | NULL | 21168 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+---------------+-------+---------------+--------------+---------+------+-------+-----------------------------------------------------------+
The query now runs very fast (as 0.0x sec), but I have to force using new index:
SELECT st_resource_id as docId, count( * ) AS Cnt
FROM nr_statistics
USE INDEX (st_site_id_2)
WHERE st_resource_type = 'document'
AND st_sub_resource_id = 'text'
AND st_time > DATE_SUB( NOW( ) , INTERVAL 7 DAY )
AND st_site_id = 1
GROUP BY st_resource_id
ORDER BY cnt DESC
LIMIT 0 , 5;
While the problem is resolved (not beautiful but effective way), I still have some open questions (see comments).
Create a composite index on (st_site_id, st_resource_type, st_sub_resourse_id, st_time, st_resource_id).
However, you will still have temporary and filesort in the plan because you are ordering on COUNT(*) which is not indexable.
If you need to run this query fast and often, you would have to create an aggregate table which would store counts for each site/resource/subresourse/week combination and update it in a trigger.
Did you try creating a composite index on st_resource_type, st_resource_id, st_time and st_site_id? It looks to me like you have several indexes, but most are on a single column, or maybe 2 columns. By having a composite index with more of the columns you need, it may improve the performance.
When doing queries with multiple where clauses the order in which you write them should match the order in which you wrote your query.
In your particular case it would be:
CREATE INDEX stats_index ON nr_statistics (st_resource_type, st_sub_resource_id, st_time, st_site_id);
This should give you a pretty good speed boost.

Index IN and a range

I need to find the best index for this query:
SELECT c.id id, type
FROM Content c USE INDEX (type_proc_last_cat)
LEFT JOIN Battles b ON c.id = b.id
WHERE type = 1
AND processing_status = 1
AND category IN (13, 19)
AND status = 4
ORDER BY last_change DESC
LIMIT 100";
The tables look like this:
mysql> describe Content;
+-------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| type | tinyint(3) unsigned | NO | MUL | NULL | |
| category | bigint(20) unsigned | NO | | NULL | |
| processing_status | tinyint(3) unsigned | NO | | NULL | |
| last_change | int(10) unsigned | NO | | NULL | |
+-------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 1 | type | A | 4 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 2 | processing_status | A | 20 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 3 | last_change | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_last_cat | 4 | category | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> describe Battles;
+---------------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| status | tinyint(4) unsigned | NO | | NULL | |
| status_last_changed | int(11) unsigned | NO | | NULL | |
+---------------------+---------------------+------+-----+---------+-------+
mysql> show indexes from Battles;
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Battles | 0 | PRIMARY | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 1 | id | A | 1215 | NULL | NULL | | BTREE | |
| Battles | 0 | id_status | 2 | status | A | 1215 | NULL | NULL | | BTREE | |
+---------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
And I get output like this:
mysql> explain
-> SELECT c.id id, type
-> FROM Content c USE INDEX (type_proc_last_cat)
-> LEFT JOIN Battles b USE INDEX (id_status) ON c.id = b.id
-> WHERE type = 1
-> AND processing_status = 1
-> AND category IN (13, 19)
-> AND status = 4
-> ORDER BY last_change DESC
-> LIMIT 100;
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
| 1 | SIMPLE | c | ref | type_proc_last_cat | type_proc_last_cat | 2 | const,const | 1352 | Using where; Using index |
| 1 | SIMPLE | b | eq_ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+--------+--------------------+--------------------+---------+-----------------------+------+--------------------------+
The trouble is the rows count for the Content table. It appears MySQL is unable to effectively use both last_change and category in the type_proc_last_cat index. If I switch the order of last_change and category, fewer rows are selected but it results in a filesort for the ORDER BY, meaning it pulls all the matching rows from the database. That is worse, since there are 100,000+ rows in both tables.
Tables are both InnoDB, so keep in mind that the PRIMARY key is appended to every other index. So the index the index type_proc_last_cat above behaves likes it's on (type, processing_status, last_change, category, id). I am aware I could change the PRIMARY key for Battles to (id, status) and drop the id_status index (and I may just do that).
Edit: Any value for type, category, processing_status, and status is less than 20% of the total values. last_change and status_last_change are unix timestamps.
Edit: If I use a different index with category and last_change in reverse order, I get this:
mysql> show indexes from Content;
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Content | 0 | PRIMARY | 1 | id | A | 4115 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 1 | type | A | 6 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 2 | processing_status | A | 26 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 3 | category | A | 228 | NULL | NULL | | BTREE | |
| Content | 1 | type_proc_cat_last | 4 | last_change | A | 4115 | NULL | NULL | | BTREE | |
+---------+------------+---------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> explain SELECT c.id id, type FROM Content c USE INDEX (type_proc_cat_last) LEFT JOIN Battles b
USE INDEX (id_status) ON c.id = b.id WHERE type = 1 AND processing_status = 1 AND category IN (13, 19) AND status = 4 ORDER BY last_change DESC LIMIT 100;
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
| 1 | SIMPLE | c | range | type_proc_cat_last | type_proc_cat_last | 10 | NULL | 165 | Using where; Using index; Using filesort |
| 1 | SIMPLE | b | ref | id_status | id_status | 9 | wtm_master.c.id,const | 1 | Using where; Using index |
+----+-------------+-------+-------+--------------------+--------------------+---------+-----------------------+------+------------------------------------------+
The filesort worries me as it tells me MySQL pulls all the matching rows first, before sorting. This will be a big problem when there are 100,000+.
rows field in EXPLAIN doesn't reflect the number of rows that actually were read. It reflects the number of rows that possible will be affected. Also it doesn't rely on LIMIT, because LIMIT is applied after the plan has been calculated.
So you don't need to worry about it.
Also I would suggest you to swap last_change and category in type_proc_last_cat so mysql can try to use the last index part (last_change) for sorting purposes.