mysql need to run optimize every day after batch jobs - mysql

I have a table with about 4M rows. Every night, about 15 batch jobs run on the data, with a few hundred thousand inserts and updates. The problem is, when I run a simple count query such as
select count(*) from items;
I have to wait for about 15 minutes for it to return. After researching on SO, I see that
optimize table items;
does seem to fix the problem, after running it, the above query returns instantly. The problem is, it takes 17 hours to run. Any suggestions on what to look for to figure out why this is happening and how to fix it?
Thanks for any help,
Kevin
UPDATE:
Here's what happens when I optimize:
mysql> optimize table items;
+------------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------------+----------+----------+-------------------------------------------------------------------+
| g_production.items | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| g_production.items | optimize | status | OK |
+------------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (9 hours 20 min 48.36 sec)
Also, strangely, the select is not using the primary index, ID:
explain select count(id) from items;
+----+-------------+-------+-------+---------------+--------------------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------------------- +---------+------+----------+-------------+
| 1 | SIMPLE | items | index | NULL | index_items_on_real_sale | 2 | NULL | 45152757 | Using index |
+----+-------------+-------+-------+---------------+--------------------------+---------+------+----------+-------------+
1 row in set (0.10 sec)
And finally, here are all the indexes on the table:
+-------+------------+---------------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+---------------------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| items | 0 | PRIMARY | 1 | id | A | 47144790 | NULL | NULL | | BTREE | | |
| items | 1 | index_items_on_affiliate_id | 1 | affiliate_id | A | 47144790 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_brand_id | 1 | brand_id | A | 1024886 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_real_sale | 1 | real_sale | A | 18 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_retailer_id_and_affiliate_id | 1 | retailer_id | A | 18 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_retailer_id_and_affiliate_id | 2 | affiliate_id | A | 47144790 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_retailer_id | 1 | retailer_id | A | 40021 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_shopzilla_id | 1 | shopzilla_id | A | 457716 | NULL | NULL | YES | BTREE | | |
| items | 1 | index_items_on_updated_at | 1 | updated_at | A | 6734970 | NULL | NULL | | BTREE | | |
Note the cardinality on the index that the EXPLAIN is revealing, I have 4M rows, but explain says it's using index_items_on_real_sale, which the show indexes command reveals has a cardinality of 18. Could this be the problem?

It could be quite a few things, but I'm wondering if it's indexed properly. Also, try to run the query with explain, like so:
EXPLAIN SELECT a,b,c WHERE....
Look at the output and see how many rows it's reading to process the query and they type of indexes etc...
Definitely need more information in order to help out, I'm just guessing based on the limited information you provided.

Related

Why does MySQL not use optimal indexes

I'm trying to optimize my query, however, MySQL seems to be utilizing non-optimal indexes on the query and I can't seem to figure out what is wrong. My query is as follows:
SELECT SQL_CALC_FOUND_ROWS deal_ID AS ID,dealTitle AS dealSaving,
storeName AS title,deal_URL AS dealURL,dealDisclaimer,
dealType, providerName,providerLogo AS providerIMG,createDate,
latitude AS lat,longitude AS lng,'local' AS type,businessType,
address1,city,dealOriginalPrice,NULL AS dealDiscountPercent,
dealPrice,scoringBase, smallImage AS smallimage,largeImage AS image,
storeURL AS storeAlias,
exp(-power(greatest(0,
abs(69.0*DEGREES(ACOS(0.82835377099147 *
COS(RADIANS(latitude)) * COS(RADIANS(-118.4-longitude)) +
0.56020534635454*SIN(RADIANS(latitude)))))-2),
2)/(5.7707801635559)) *
scoringBase * IF(submit_ID IN (18381),
IF(businessType = 1,1.3,1.2),IF(submit_ID IN (54727),1.19, 1)
) AS distance
FROM local_deals
WHERE latitude BETWEEN 33.345362318841 AND 34.794637681159
AND longitude BETWEEN -119.61862872928 AND -117.18137127072
AND state = 'CA'
AND country = 'US'
ORDER BY distance DESC
LIMIT 48 OFFSET 0;
Listing the indexes on the table reveals:
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| local_deals | 0 | PRIMARY | 1 | id | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 0 | unique_deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | store_ID | 1 | store_ID | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeOnline_ID | 1 | storeOnline_ID | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeChain_ID | 1 | storeChain_ID | A | 117 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | userProvider_ID | 1 | userProvider_ID | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | expirationDate | 1 | expirationDate | A | 3127 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | createDate | 1 | createDate | A | 96946 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | city | 1 | city | A | 17626 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | state | 1 | state | A | 138 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | zip | 1 | zip | A | 38778 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | country | 1 | country | A | 39 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | latitude | 1 | latitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | longitude | 1 | longitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | eventDate | 1 | eventDate | A | 4215 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | isNowDeal | 1 | isNowDeal | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | businessType | 1 | businessType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | dealType | 1 | dealType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | submit_ID | 1 | submit_ID | A | 5 | NULL | NULL | YES | BTREE | | |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Running explain extended reveals:
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | ref | state,country,latitude,longitude | state | 35 | const | 52472 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
There are around 200k rows in the table. What is strange is that it is ignoring the latitude and longitude indexes as those should filter the table more. Running a query where I remove the "state" and "country" where commands reveals the following explain:
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | range | latitude,longitude | longitude | 5 | NULL | 30662 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
This shows that the longitude index would better filter the table to 30,662 rows. Am I missing something here? How can I get MySQL to use all queries. Note that the table is InnoDB and I'm using MySQL 5.5.
The best index for your query is a composite index on (country, state, latitude, longitude) (country and state could be swapped). MySQL has good documentation on multi-column indexes, which is here.
Basically, latitude and longitude are not particularly selective individually. Unfortunately, the standard B-tree index only supports one inequality, and your query has two.
Actually, if you want GIS processing, then you should use a spatial extension to MySQL.
Depending on the size of your table, Gordon's suggested index may be "good enough". If you need to get even more performance, you need to go to a 2D partitioning technique, wherein you partition on latitude and arrange for the InnoDB PRIMARY KEY to begin with longitude. More details, and sample code, are available in my article.
A generic technique for problems like this is to build a subquery with these properties:
It returns no more than LIMIT rows; and those are all that you need.
There is a "covering index" for the columns involved, plus the PRIMARY KEY.
You are using InnoDB.
Something like
SELECT b. ..., a.distance
FROM local_deals b
JOIN (
SELECT id,
(...) AS distance,
FROM local_deals
WHERE latitude BETWEEN 33.34536 AND 34.79464
AND longitude BETWEEN -119.61863 AND -117.18137
AND state = 'CA'
AND country = 'US'
ORDER BY distance ASC
LIMIT 48 OFFSET 0
) AS a ON b.id = a.id
ORDER BY a.distance;
INDEX(country, state, latitude, longitude, id) -- `id` is the PK
-- country and state first (because of '='); id last.
Why this helps...
The index is "covering", so the lengthy scan (of a lot more than 48 rows) is done entirely in the index's BTree. This cuts down on I/O for huge tables.
All the other fields (b.*) are not hauled around through tmp tables, etc. Only 48 are sets of those fields are dealt with.
The 48 lookups by id are especially efficient in InnoDB due to the "clustered PK".
When working with "huge" tables, where I/O dominates, this technique can be counted thus:
1, or a small number of, blocks in the index are needed for the subquery. Note that the desired records are consecutive, or nearly so. (OK, if there are 30K to look through, it could be more than 100 blocks; hence my comment about shrinking the bounding box to start with.)
Then 48 (LIMIT) random fetches via id get the 48 rows.
Without the subquery, the bulky rows need to be fetched. And, depending on the index used, that could be up to 30K blocks fetched. That's orders of magnitude slower.
Also, 48 rows versus 30K rows will be written to a tmp table for sorting (ORDER BY).

MySQL Query Performance - Query/Schema/Indexes?

Basically having some performance issues with queries, mainly to my largest table which holds call data.
The main query contains quite a few left joins & sub-selects, but in a scenario where I'm running a query where I expect back 1.3M calls to be returned, the query is just not doing it. Having to stop it at 7 minutes means there's definately a problem somewhere.
I've narrowed down the main query and tested the simplest sub-select join which is
SELECT
DateStart,
ID,
NumbID,
EffectiveFlag,
OrigNumber
FROM calls
WHERE
DateStart <= '2013-12-31'
AND DateStart >= '2013-01-01'
AND CallLength >= '00:00:00'
AND Direction = '1'
AND CustID IN (474,482,250,268,197,604,132,359,279,441,118,448,152,133,380,162,249,679,226,259,2450,2408,2451,2453,2439,2454,2444,2445,2452)
And even that query takes 4.5s - so when it's a sub-select in a query with other joins & sub-selected, I can imagine why the query as a whole is unusable.
The explain statement for the above query is
+----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+
| 1 | SIMPLE | calls | range | idx_CustID,idx_DateStart,idx_CustID_DateStart,idx_CustID_TermNumber,idx_Direction | idx_CustID_DateStart | 7 | NULL | 1660009 | Using where |
+----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+
The database schema of the calls table is
+-------------------+-------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------------------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| CustID | int(11) | NO | MUL | 0 | |
| CarrID | int(11) | NO | MUL | NULL | |
| TariID | int(11) | NO | MUL | 0 | |
| CarrierRef | varchar(30) | NO | MUL | | |
| NumbID | int(11) | NO | MUL | 0 | |
| VlviID | int(11) | NO | MUL | NULL | |
| VcamID | int(11) | NO | MUL | NULL | |
| SomeID | int(11) | NO | MUL | NULL | |
| VlnsID | int(11) | NO | MUL | NULL | |
| NGNumber | varchar(12) | NO | | | |
| OrigNumber | varchar(16) | NO | MUL | NULL | |
| CLIRestrictedFlag | int(2) | NO | | NULL | |
| OrigLocality | varchar(11) | NO | MUL | | |
| OrigAreaCode | varchar(11) | NO | MUL | | |
| TermNumber | varchar(16) | NO | MUL | NULL | |
| BatchNumber | varchar(10) | NO | MUL | | |
| DateStart | date | NO | MUL | 0000-00-00 | |
| DateClear | date | NO | | 0000-00-00 | |
| TimeStart | time | NO | | 00:00:00 | |
| TimeClear | time | NO | | 00:00:00 | |
| CallLength | time | NO | | 00:00:00 | |
| RingLength | time | NO | | 00:00:00 | |
| EffectiveFlag | smallint(1) | NO | MUL | NULL | |
| UnansweredFlag | smallint(1) | NO | MUL | NULL | |
| EngagedFlag | smallint(1) | NO | | NULL | |
| RecID | int(11) | NO | MUL | NULL | |
| CreatedUserID | int(11) | NO | | 0 | |
| CreatedDatetime | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| Direction | int(1) | NO | MUL | NULL | |
+-------------------+-------------+------+-----+---------------------+----------------+
The indexes on the calls table are
+-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| calls | 0 | PRIMARY | 1 | ID | A | 23905312 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | |
| calls | 1 | idx_NumbID | 1 | NumbID | A | 37765 | NULL | NULL | | BTREE | |
| calls | 1 | idx_OrigNumber | 1 | OrigNumber | A | 5976328 | NULL | NULL | | BTREE | |
| calls | 1 | idx_OrigLocality | 1 | OrigLocality | A | 45019 | NULL | NULL | | BTREE | |
| calls | 1 | idx_OrigAreaCode | 1 | OrigAreaCode | A | 846 | NULL | NULL | | BTREE | |
| calls | 1 | idx_TermNumber | 1 | TermNumber | A | 232090 | NULL | NULL | | BTREE | |
| calls | 1 | idx_DateStart | 1 | DateStart | A | 4596 | NULL | NULL | | BTREE | |
| calls | 1 | idx_EffectiveFlag | 1 | EffectiveFlag | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_UnansweredFlag | 1 | UnansweredFlag | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_EngagedFlag | 1 | UnansweredFlag | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_TariID | 1 | TariID | A | 110 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_DateStart | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_DateStart | 2 | DateStart | A | 919435 | NULL | NULL | | BTREE | |
| calls | 1 | idx_NumbID_DateStart | 1 | NumbID | A | 37765 | NULL | NULL | | BTREE | |
| calls | 1 | idx_NumbID_DateStart | 2 | DateStart | A | 5976328 | NULL | NULL | | BTREE | |
| calls | 1 | idx_RecID | 1 | RecID | A | 288015 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CarrierRef | 1 | CarrierRef | A | 7968437 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_CallTermNumber | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_CallTermNumber | 2 | TermNumber | A | 246446 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CreatedDatetime | 1 | CreatedDatetime | A | 771139 | NULL | NULL | | BTREE | |
| calls | 1 | idx_Direction | 1 | Direction | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_VlviID | 1 | VlviID | A | 50539 | NULL | NULL | | BTREE | |
| calls | 1 | idx_SomeID | 1 | SomeID | A | 30 | NULL | NULL | | BTREE | |
| calls | 1 | idx_VcamID | 1 | VcamID | A | 64 | NULL | NULL | | BTREE | |
| calls | 1 | idx_VlnsID | 1 | VlnsID | A | 191 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CarrID | 1 | CarrID | A | 4 | NULL | NULL | | BTREE | |
| calls | 1 | idx_BatchNumber | 1 | BatchNumber | A | 271651 | NULL | NULL | | BTREE | |
+-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
Something which I understand may be causing the performance, is the indexes on columns with a low cardinality. I know columns such as Direction which has a cardinality of 2 is actually probably worse of with an index in terms of performance, but that alone shouldn't be making the statement so slow.
In terms of the cardinality requirements to have a worthwhile index, is there a general cardinality percentage compared to total table records at which an index increases performance and when it reduces performance?
I understand that no one is going to be able to throw an answer at me that will change the query time from 4.5s to 0.01s, but any advice on either the query itself, the table schema, the indexes, or the hardware would be greatly appreciated.
Update:
#Sebas "could you please rerun the query AND explain plan without the part: AND CallLength >= '00:00:00' AND Direction = '1' please?"
+----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+
| 1 | SIMPLE | calls | range | idx_CustID,idx_DateStart,idx_CustID_DateStart,idx_CustID_TermNumber | idx_CustID_DateStart | 7 | NULL | 724813 | Using where |
+----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+
Is your "DateStart" a truncated datetime -- keep date only? If not, you may want to build one with truncated value (by day, or hour), and use int datatype, which will make the index much smaller for faster query.
Or, another way to optimize (golden rule #1 don't do it, #2 do't do it now).
If and only if your date and PK are sync in sequence, you can build a external index of Range of StartDate <=> ID (PK).
and using below pattern
SELECT #start:=ID_START FROM ANOTHER_TABLE WHERE StartDate='2013-01-01'
SELECT #end:=ID_END FROM ANOTHER_TABLE WHERE StartDate='2013-12-31'
SELECT * FROM calls WHERE ID BETWEEN #start and #end AND CustId in (xxxxx) ....
By using above pattern, Mysql will know if has to scan only a segment of table.
Like Darhazer said, you have way too many indexes, start by removing all of them and build them up again based on your needs.
For this specific query, create one INDEX with these fields in it:
DateStart
CallLength
Direction
CustID
Change AND Direction = '1' to AND Direction = 1 (remove the quotes, you're comparing an integer, not a string)
And see what this does to your query time. If this goes well, add the subquery, check it again with EXPLAIN, add the needed indexes and so on.
The best index that your query should be hitting is idx_CustID_DateStart. The IN statement is preventing that from happening. If the CustID list is from a table, I suggest JOIN it in, instead of enumerating.
I am not sure that the original query that takes more than 7 minutes, is written correctly when you are worried by a subquery which takes 5 seconds (hopefully it is not executed for each row). But anyway, if you want to speed up this one you should read something how indexes work. I would recommend this article to begin with.
Basically you have conditions on 4 fields, and on two fields those are range conditions. If you have read the article, you know that the index is effectively used until the first range condition is met. Though, the rest of the data in index can be used for index scanning. Thus, you need to choose which condition is better narrowing the resultset: on DateStart or on CallLength.
Anyway, you need a composite index that starts with (CustID, Direction .... My feeling is that a condition on the DateStart is better. So I would start with (CustID, Direction, DateStart, CallLength), and compare it with (CustID, Direction, DateStart), because the last field might not give a sufficient performance gain, but will take memory resources.
Though I still think, one should be sure that the rest of the query is written correctly when concentrating on a subquery. There might be a properer way to organize the query, so that this optimization would turn out irrelevant.
4.5s is not much for a 1.6m rows returned, I am pretty sure it's all spent on IO operations. Then there is hardly any space left for optimisation. You'd better present us your original query, may be we can help better.
What % of total those 1.6m makes? Indexes are good if they're used to return smallest part of dataset, but since their data access pattern with mrr is random reads, its sometimes more efficient using fullscan on a table. Surely it depends on how data has been added to the table and how space was allocated on disk.
Also you might find useful to monitor performance with MySQL performance schema, look here for details.
You have too much indexes. For example, you don't need a separate CustID Index, becuase it's the left-most in the CustID,DateStart. You have 2 indexes on UnansweredFlag. And do you really need all of those indexes? This not only slows down the inserts/updates, it also slows down the optimzier and may trick the optimizer to choose not-so-good index.
Now, on the specific query. You need to see what field or combination limits the query the most (as now it scans 1,6M rows!) and force it to use that index. So run SELECT COUNT(*) queries for each of the where clauses (direction, call length) with the DateStart specified (you'll always want to limit based on this). Maybe you just need to add the direction to the index.
Also, before MySQL 5.6, subqueries in the WHERE clause are not optimized, so maybe you should rewrite the entire query to use join instead of subselect, and not to optimize the particular query

What's the difference between these two indexed MySQL tables?

I'm working on modifying an old MySQL database which turned out to be designed improperly for the sort of data it was storing. I'm not very familiar with SQL at all, so I used SHOW CREATE TABLE to get the CREATE statement used for the old table ('interaction_old') and copied it almost exactly, with the only changes being a few of the column names and data types, to make a new table ('interaction_new'). Now some queries which were using indexes in the old table no longer use indexes in the new table, and I can't figure out why.
Here are the indexes from both tables:
mysql> SHOW KEYS FROM interaction_old;
+-----------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| interaction_old | 0 | PRIMARY | 1 | interactionid | A | 138996006 | NULL | NULL | | BTREE | |
| interaction_old | 1 | Complex_pdbid | 1 | Complex_pdbid | A | 1338 | NULL | NULL | | BTREE | |
| interaction_old | 1 | Protein_id | 1 | Protein_id | A | 13737 | NULL | NULL | | BTREE | |
| interaction_old | 1 | RNA_id | 1 | RNA_id | A | 2806 | NULL | NULL | | BTREE | |
+-----------------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> SHOW KEYS FROM interaction_new;
+-----------------+------------+------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| interaction_new | 0 | PRIMARY | 1 | interactionid | A | 152311144 | NULL | NULL | | BTREE | |
| interaction_new | 1 | pdbid | 1 | pdbid | A | 2924 | NULL | NULL | | BTREE | |
| interaction_new | 1 | pchainname | 1 | pchainname | A | 472 | NULL | NULL | | BTREE | |
| interaction_new | 1 | rchainname | 1 | rchainname | A | 487 | NULL | NULL | | BTREE | |
+-----------------+------------+------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
And an example query that is behaving differently between the two:
mysql> EXPLAIN SELECT DISTINCT Complex_pdbid FROM interaction_old;
+----+-------------+-----------------+-------+---------------+---------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------------+---------+------+------+--------------------------+
| 1 | SIMPLE | interaction_old | range | NULL | Complex_pdbid | 6 | NULL | 1339 | Using index for group-by |
+----+-------------+-----------------+-------+---------------+---------------+---------+------+------+--------------------------+
mysql> EXPLAIN SELECT DISTINCT pdbid FROM interaction_new;
+----+-------------+-----------------+------+---------------+------+---------+------+-----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-----------+-----------------+
| 1 | SIMPLE | interaction_new | ALL | NULL | NULL | NULL | NULL | 152311144 | Using temporary |
+----+-------------+-----------------+------+---------------+------+---------+------+-----------+-----------------+
As you might expect, the query on interaction_old finishes in a fraction of a second, whereas I've let the query on interaction_new run for ~20 minutes before killing it. interaction_old.Complex_pdbid and interaction_new.pdbid are the same data type (and are storing almost exactly the same data). USE INDEX and/or FORCE INDEX doesn't seem to have any effect. What's causing the different behavior?
Edit: According to the documentation, the first table uses a loose index scan to increase speed -- nothing from that page makes it clear to me why this doesn't work on the second table, though.

Avoid "Using temporary" and "Using filesort" with ORDER BY

sorry but after browsing nearly every posts and questions about it, I still can't manage to get rid of "Using temporary" and "Using filesort" in a simple query. I know this is a problem of keys but I can't find the right combination...
I also don't know if the order of the join defined by the optimizer is ok, I tested other orders using STRAIGHT_JOIN but nothing better... The query is pretty slow using ORDER BY, but really fast without it and of course without "Using temporary" and "Using filesort"! (there is something like 100.000 rows in points table)
The query :
SELECT points.id,
points.id_owner,
points.point_title,
points.point_desc,
users.user_id,
users.username
FROM points,
JOIN users ON points.id_owner = users.user_id
JOIN follows ON follows.id_followed = points.id_owner
WHERE points.deleted = 0
AND follows.id_follower = 22
ORDER BY points.id DESC
LIMIT 10
the explain :
+----+-------------+---------+--------+---------------+------------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+------------+---------+---------------------+------+----------------------------------------------+
| 1 | SIMPLE | follows | ref | FOLLOW_DUO | FOLLOW_DUO | 4 | const | 2 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | follows.id_followed | 1 | |
| 1 | SIMPLE | points | ref | GETPOINT1 | GETPOINT1 | 5 | users.user_id,const | 460 | Using where |
+----+-------------+---------+--------+---------------+------------+---------+---------------------+------+----------------------------------------------+
And here is the SHOW INDEX from the three tables :
SHOW INDEX FROM points
+--------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| points | 0 | PRIMARY | 1 | id | A | 91987 | NULL | NULL | | BTREE | |
| points | 0 | GETPOINT1 | 1 | id_owner | A | NULL | NULL | NULL | | BTREE | |
| points | 0 | GETPOINT1 | 2 | deleted | A | NULL | NULL | NULL | | BTREE | |
| points | 0 | GETPOINT1 | 3 | id | A | 91987 | NULL | NULL | | BTREE | |
+--------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM users
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| users | 0 | PRIMARY | 1 | user_id | A | 4 | NULL | NULL | | BTREE | |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM follows
+---------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| follows | 0 | PRIMARY | 1 | id | A | 5 | NULL | NULL | | BTREE | |
| follows | 0 | FOLLOW_DUO | 1 | id_follower | A | NULL | NULL | NULL | | BTREE | |
| follows | 0 | FOLLOW_DUO | 2 | id_followed | A | 5 | NULL | NULL | | BTREE | |
| follows | 1 | id_follower | 1 | id_follower | A | NULL | NULL | NULL | | BTREE | |
| follows | 1 | id_followed | 1 | id_followed | A | NULL | NULL | NULL | | BTREE | |
+---------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
From now I don't know what to test to try to avoid the "Using temporary" and "Using filesort"... So if you have an idea for me... Thank you in advance for your help !
Looks like so many rows are being examined from points table. I had tried following trick to avoid temporary table usage in my project. Please do as follows and give it an explain to see any improvement:
Delete all indexes called 'GETPOINT1' except Primary Key Index form points table.
Add covering index on columns (deleted, id_owner). Please keep the order of columns as mentioned.
If you still don't see any improvement, remove above index and add index again in order (id, deleted, id_owner) and (deleted, id_owner, id) columns and try again
In addition you may remove follows.id_follower = 22 from where clause and put it in join condition like JOIN follows ON follows.id_followed = points.id_owner AND follows.id_follower = 22
Please also add index in order as (id_follower, id_owner) in follows table.
I do not guarantee but above should be able to give you improvements.

Simple MySQL query with performance issues

I have the following simple MySQL query:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31';
When I run this it takes 0.29 seconds to bring back 36074 results. When I increase my date period to bring back more results (65703) it runs in 0.56. When I run other similar SQL queries on the same server but on different tables (some tables are larger) the results come back in approximately 0.01 seconds.
Although 0.29 isn't slow - this is a basic part for a complex query and this timing means that it is not scalable.
See below for the table definition and indexes.
I know it's not server load as I have the same issue on a development server which has very little usage.
+---------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+--------------+------+-----+---------+----------------+
| mainID | int(11) | NO | PRI | NULL | auto_increment |
| otherID1 | int(11) | NO | MUL | NULL | |
| otherID2 | int(11) | NO | MUL | NULL | |
| otherID3 | int(11) | NO | MUL | NULL | |
| keyword | varchar(200) | NO | MUL | NULL | |
| dateStartCol | date | NO | MUL | NULL | |
| timeStartCol | time | NO | MUL | NULL | |
| dateEndCol | date | NO | MUL | NULL | |
| timeEndCol | time | NO | MUL | NULL | |
| statusCode | int(1) | NO | MUL | NULL | |
| uRL | text | NO | | NULL | |
| hostname | varchar(200) | YES | MUL | NULL | |
| IPAddress | varchar(25) | YES | | NULL | |
| cookieVal | varchar(100) | NO | | NULL | |
| keywordVal | varchar(60) | NO | | NULL | |
| dateTimeCol | datetime | NO | MUL | NULL | |
+---------------------------+--------------+------+-----+---------+----------------+
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
| tableName | 0 | PRIMARY | 1 | mainID | A | 661990 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID1 | 1 | otherID1 | A | 330995 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID2 | 1 | otherID2 | A | 25 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_otherID3 | 1 | otherID3 | A | 48 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_dateStartCol | 1 | dateStartCol | A | 187 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_timeStartCol | 1 | timeStartCol | A | 73554 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_dateEndCol | 1 | dateEndCol | A | 188 | NULL | NULL | | BTREE | |
|tableName | 1 | idx_timeEndCol | 1 | timeEndCol | A | 73554 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_keyword | 1 | keyword | A | 82748 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_hostname | 1 | hostname | A | 2955 | NULL | NULL | YES | BTREE | |
| tableName | 1 | idx_dateTimeCol | 1 | dateTimeCol | A | 220663 | NULL | NULL | | BTREE | |
| tableName | 1 | idx_statusCode | 1 | statusCode | A | 2 | NULL | NULL | | BTREE | |
+--------------------+------------+-------------------------------+--------------+---------------------------+-----------+-------------+----------+--------+------+------------+---------+
Explain Output:
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | tableName | range | idx_otherID3,idx_dateStartCol | idx_dateStartCol | 3 | NULL | 66875 | 75.00 | Using where |
+----+-------------+-----------+-------+----------------------------------+-------------------+---------+------+-------+----------+-------------+
If that is really your query (and not a simplified version of same), then this ought to achieve best results:
CREATE INDEX table_ndx on tableName( otherID3, dateStartCol, mainID);
The first index entry means that the first match in the WHERE is very fast; the same also applies with dateStartCol. The third field is very small and does not slow the index appreciably, but allows for the datum you require to be found immediately in the index with no table access at all.
It is important that the keys are in the same index. In the EXPLAIN you posted, each key is in an index of its own, so even if MySQL chooses the best index, the performances will not be optimal. I'd try and use less indexes, for they also have a cost (shameless plug: Can Indices actually decrease SELECT performance? ).
First try to add the right key. It seems like dateStartCol is more selective than otherID3
ALTER TABLE tableName ADD KEY idx_dates(dateStartCol, dateStartCol)
Second - please make sure you select only rows you need by adding LIMIT clause to the SELECT. This will should up the query. Try like this:
SELECT SQL_NO_CACHE mainID
FROM tableName
WHERE otherID3=19
AND dateStartCol >= '2012-08-01'
AND dateStartCol <= '2012-08-31'
LIMIT 10;
Please also make sure that your MySQL tuned up properly. You may want to check key_buffer_size and innodb_buffer_pool_size as described in http://astellar.com/2011/12/why-is-stock-mysql-slow/
If this is a recurrent or important query then create a multiple column index:
CREATE INDEX index_name ON tableName (otherID3, dateStartCol)
Delete the non used indexes as they make table changes more expensive.
BTW you don't need two separate columns for date and time. You can combine then in a datetime or timestamp type. One less column and one less index.
The explain output shows it chose the dateStartCol index so you could try the opposite I suggested above:
CREATE INDEX index_name ON tableName (dateStartCol, otherID3)
Notice that the query's dateStartCol condition will still get 75% of the rows so not much improvement, if any, in using that single index.
How unique is otherID3? If there are not many repeated otherID3 you can hint the engine to use it.