Slow query and use of indexes in MySQL - mysql

I have the following query:
SELECT final_query.chr
, final_query.start
, final_query.end
, co.chr
, co.start
, co.end
, final_query.count
FROM (SELECT ed.chr
, ed.start
, ed.end
, case when e.bin1=ed.bin then e.bin2 else e.bin1 end AS target
, count
FROM (SELECT * FROM coordinates
WHERE chr="chr1" AND (start between 3960000 AND 4000000 OR end between 3960000 AND 4000000)
) ed
JOIN counts e ON (e.bin1 = ed.bin OR e.bin2=ed.bin)
SORT BY count LIMIT 1,20)
AS final_query
JOIN coordinates co ON final_query.target=co.bin;
and the output of EXPLAINED is:
+------+-------------+-------------+--------+---------------+---------+---------+-------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------+--------+---------------+---------+---------+-------+----------+------------------------------------+
| 1 | SIMPLE | e | ALL | bin1,bin2 | NULL | NULL | NULL | 30763816 | Using filesort |
| 1 | SIMPLE | coordinates | ref | PRIMARY,chr | chr | 22 | const | 4929 | Using index condition; Using where |
| 1 | SIMPLE | co | eq_ref | PRIMARY | PRIMARY | 22 | func | 1 | Using where |
+------+-------------+-------------+--------+---------------+---------+---------+-------+----------+------------------------------------+
What I am doing is to perform the following query of table coordinates, which has field chr indexed. So, in the subquery shown below, I filter those rows that match my conditions.
... (SELECT * FROM coordinates
WHERE chr="chr1" AND (start between 3960000 AND 4000000 OR end between 3960000 AND 4000000)
) ...
This table outputs field bin, also indexed. This field bin links with bin1 and bin2 both from table counts and indexed as well. So, here, what I want is to get all those rows in table counts having coordinates.bin in fields bin1 and bin2. Why in this step no index is used?
Besides of it, I would like to add an ORDER BY in my query, just before the LIMIT statement. But it slows too much my query. I don't know why, because it have to sort a maximum of 4000 rows...
How can I optimize my query?
My tables, from the DESCRIBE statement:
Table counts
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| bin1 | varchar(20) | NO | MUL | NULL | |
| bin2 | varchar(20) | NO | MUL | NULL | |
| count | float(6,2) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
Table coordinates
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| bin | varchar(20) | NO | PRI | NULL | |
| chr | varchar(20) | NO | MUL | NULL | |
| start | int(11) | NO | | NULL | |
| end | int(11) | NO | | NULL | |
+-------+-------------+------+-----+---------+-------+

Related

MySql JOIN performance go down with group by

I have these tables:
table "f" (26000 record)
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| idFascicolo | int(11) | NO | PRI | | |
| oggetto | varchar | NO |index| | |
+------------------+------------------+------+-----+---------+-------+
table "r" (22000 record)
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| idRichiedente | int(11) | NO | PRI | | |
| name | varchar | NO |index| | |
+------------------+------------------+------+-----+---------+-------+
table "fr" (32000 record)
+------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | | |
| idFascicolo | int(11) | NO |index| | FK |
| idRichiedente | int(11) | NO |index| | FK |
+------------------+------------------+------+-----+---------+-------+
this is my select:
SELECT
f.idFascicolo,
f.oggetto,
r.richiedente
FROM fr
JOIN f ON (f.idFascicolo=fr.idFascicolo)
JOIN r ON (r.idRichiedente=fr.idRichiedente)
WHERE r.name LIKE '%string%'
in the result, I would like to see only 1 row per f.idFascicolo (I should have "Rossi Mario" and "Rossi Marco" for the same f.idFascicolo) , the my new select is:
SELECT
f.idFascicolo,
f.oggetto,
r.richiedente
FROM fr
JOIN f ON (f.idFascicolo=fr.idFascicolo)
JOIN r ON (r.idRichiedente=fr.idRichiedente)
WHERE r.name LIKE '%string%'
GROUP BY f.idFascicolo
here, the performance read from PhpMyAdmin:
0.0057 seconds: .. WHERE r.name LIKE '%string%'
0.0527 seconds: .. WHERE r.name LIKE '%string%' GROUP BY f.idFascicolo
0.0036 seconds: .. WHERE r.name LIKE 'string%' GROUP BY f.idFascicolo
I don't understand if the problem of the slow query is GROUP BY or LIKE '%string%'(i need '%string%' .. I can't find an equivalent solution with fulltext index and MATCH .. AGAINST)
This is the explain:
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| id | select type | table | type | possible keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| 1 | simple | r | ALL | PRIMARY | NULL | NULL | NULL | 20925 |Using where; Using temporary; Using filesort |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| 1 | simple | fr | ref |idFascicolo,idRichiedente| idRichiedente | 4 | db.r.idRichiedente | 1 | |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
| 1 | simple | f |eq_ref|PRIMARY | PRIMARY | 4 | db.fr.idFascicolo | 1 | |
+------+-------------+-------+------+-------------------------+---------------+---------+----------------------+-----------+---------------------------------------------+
You have two potential performance issues. First is the GROUP BY. This requires sorting the data, so it has to read all the data and do a lot of work.
The second is the LIKE. There is a fundamental difference between:
WHERE r.name LIKE '%string%'
and
WHERE r.name LIKE 'string%'
The second can use an index on r(name), because the like pattern does not start with a pattern.
I am not sure what your actual question is. I don't recommend doing using GROUP BY the way you are using it -- because you have unaggregated columns in the SELECT.

Optimizing join on derived table - EXPLAIN different on local and server

I have the following ugly query, which runs okay but not great, on my local machine (1.4 secs, running v5.7). On the server I'm using, which is running an older version of MySQL (v5.5), the query just hangs. It seems to get caught on "Copying to tmp table":
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_number,
p.street_name,
p.site_address_city_state,
p.number_of_units,
p.number_of_stories,
p.bedrooms,
p.bathrooms,
p.lot_area_sqft,
p.cost_per_sq_ft,
p.year_built,
p.sales_date,
p.sales_price,
p.id
FROM (
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
JOIN property AS p
ON p.parcel_number = open_cases.APN
LIMIT 0, 1000;
mysql> show processlist;
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
| 21120 | headsupcity | localhost | lead_housing | Query | 21 | Copying to tmp table | SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
p.street_numbe |
| 21121 | headsupcity | localhost | lead_housing | Query | 0 | NULL | show processlist |
+-------+-------------+-----------+--------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Explains are different on my local machine and on the server, and I'm assuming the only reason my query runs at all on my local machine, is because of the key that is automatically created on the derived table:
Explain (local):
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
| 1 | PRIMARY | p | NULL | ALL | NULL | NULL | NULL | NULL | 40319 | 100.00 | Using temporary |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 8 | lead_housing.p.parcel_number | 40 | 100.00 | NULL |
| 2 | DERIVED | pi | NULL | ALL | NULL | NULL | NULL | NULL | 1623978 | 100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+------------------------------+---------+----------+---------------------------------+
Explain (server):
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
| 1 | PRIMARY | p | ALL | NULL | NULL | NULL | NULL | 41369 | Using temporary |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 122948 | Using where; Distinct; Using join buffer |
| 2 | DERIVED | pi | ALL | NULL | NULL | NULL | NULL | 1718586 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+------------------------------------------+
Schemas:
mysql> explain property_inspection;
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lblCaseNo | int(11) | NO | MUL | NULL | |
| APN | bigint(10) | NO | MUL | NULL | |
| date | varchar(50) | NO | | NULL | |
| status | varchar(500) | NO | | NULL | |
| property_case_detail_id | int(11) | YES | MUL | NULL | |
| case_type_id | int(11) | YES | MUL | NULL | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| update_status | tinyint(1) | YES | | 1 | |
| created_date | datetime | NO | | NULL | |
+-------------------------+--------------+------+-----+-------------------+-----------------------------+
10 rows in set (0.02 sec)
mysql> explain property; (not all columns, but you get the gist)
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| parcel_number | bigint(10) | NO | | 0 | |
| date_modified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| created_date | datetime | NO | | NULL | |
+----------------------------+--------------+------+-----+-------------------+-----------------------------+
Variables that might be relevant:
tmp_table_size: 16777216
innodb_buffer_pool_size: 8589934592
Any ideas on how to optimize this, and any idea why the explains are so different?
Since this is where the Optimizers are quite different, let's try to optimize
SELECT APN, property_case_detail_id FROM property_inspection AS pi
GROUP BY APN, property_case_detail_id
HAVING
COUNT(IF(status='Resolved Date', 1, NULL)) = 0
) as open_cases
Give this a try:
SELECT ...
FROM property AS p
WHERE NOT EXISTS ( SELECT 1 FROM property_inspection
WHERE status = 'Resolved Date'
AND p.parcel_number = APN )
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
or...
SELECT ...
FROM property AS p
LEFT JOIN property_inspection AS pi ON p.parcel_number = pi.APN
WHERE pi.status = 'Resolved Date'
AND pi.APN IS NULL
ORDER BY ??? -- without this, the `LIMIT` is unpredictable
LIMIT 0, 1000;
Index:
property_inspection: INDEX(status, parcel_number) -- in either order
MySQL 5.5 and 5.7 are quite different and the later has better optimizer so there is no surprise that explain plans are different.
You'd better provide SHOW CREATE TABLE property; and SHOW CREATE TABLE property_inspection; outputs as it will show indexes that are on your tables.
Your sub-query is the issue.
- Server tries to process 1.6M rows with no index and grouping everything.
- Having is quite expensive operation so you'd better avoid it, expecially in sub-queries.
- Grouping in this case is bad idea. You do not need the aggregation/counting. You need to check if the 'Resolved Date' status is just exists
Based on the information provided I'd recommend:
- Alter table property_inspection to reduce length of status column.
- Add index on the column. Use covering index (APN, property_case_detail_id, status) if possible (in this columns order).
- Change query to something like this:
SELECT
SQL_CALC_FOUND_ROWS
DISTINCT p.parcel_number,
...
p.id
FROM
property_inspection AS `pi1`
INNER JOIN property AS p ON (
p.parcel_number = `pi1`.APN
)
LEFT JOIN (
SELECT
`pi2`.property_case_detail_id
, `pi2`. APN
FROM
property_inspection AS `pi2`
WHERE
`status` = 'Resolved Date'
) AS exclude ON (
exclude.APN = `pi1`.APN
AND exclude.property_case_detail_id = `pi1`.property_case_detail_id
)
WHERE
exclude.APN IS NULL
LIMIT
0, 1000;

postgresql returns null but mysql doesn't

I have an application for which I am migrating from Mysql to Psql.
I have three tables t1,t2,t3 described below . Table t3 will always have a entry as long as user is available , but both table t1 and t2 may or may not have entry if the user doesn't create DB in his account.
While executing q1 in mysql , I get result set containing values fetched from table t3, even if t1 and t2 doesn't have entry , but it returns null in psql . So I've written two queries pq1 and pq2 to be equivalent to q1 . What is the reason that mysql doesn't return null values but psql does? Is there any better solution to this than breaking down into two queries for psql ?
mysql query (q1)
select
COALESCE(sum(dr.NO_OF_QT),0),
ur.NO_OF_USERS, ur.NO_OF_DB, 0,
COALESCE(sum(dr.NO_OF_SM),0),
COALESCE(sum(dr.NO_OF_RPTS),0)
from DataBaseProps dr
left join DataBaseDetails db on dr.DB_ID=db.ID and db.STATUS=1
left join UserProps ur on db.OWNER_UID=ur.USER_UID
where ur.USER_UID='USER_UID'
Psql-query_1 (pq1)
select
COALESCE(sum(dr.NO_OF_QT),0),
0,0, 0,
COALESCE(sum(dr.NO_OF_SM),0),
COALESCE(sum(dr.NO_OF_RPTS),0)
from DataBaseProps dr
left join DataBaseDetails db on dr.DB_ID=db.ID and db.STATUS=1
left join UserProps ur on db.OWNER_UID=ur.USER_UID
where ur.USER_UID='USER_UID'
psql-query_2(pq2)
select NO_OF_USERS,NO_OF_DB from UserProps
where USER_UID='USER_UID'
Table 1 DataBaseProps (t1)
desc DataBaseProps >
+-----------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------+------+-----+---------+-------+
| DB_ID | bigint(19) | NO | PRI | NULL | |
| NO_OF_RPTS | int(10) | YES | | 0 | |
| NO_OF_QT | int(10) | YES | | 0 | |
| NO_OF_SM | int(10) | YES | | 0 | |
+-----------------+------------+------+-----+---------+-------+
Table 2 - DataBaseDetails(t2)
desc DataBaseDetails>
+-------------------+--------------+------+-----+-----------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+-----------------+-------+
| ID | bigint(19) | NO | PRI | NULL | |
| NAME | varchar(50) | NO | | NULL | |
| STATUS | int(10) | NO | | 1 | |
| OWNER_UID | bigint(19) | NO | | NULL | |
+-------------------+--------------+------+-----+-----------------+-------+
Table 3 UserProps(t3)
desc UserProps>
+-----------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------+------+-----+---------+-------+
| USER_UID | bigint(19) | NO | PRI | NULL | |
| NO_OF_DB | int(10) | YES | | 0 | |
| NO_OF_USERS | int(10) | YES | | 0 | |
+-----------------+------------+------+-----+---------+-------+

How can I optimize this mysql query to find maximum simultaneous calls?

I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this:
+---------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| CallType | varchar(32) | NO | | NULL | |
| StartTime | datetime | NO | MUL | NULL | |
| StopTime | datetime | NO | | NULL | |
| CallDuration | float(10,5) | NO | | NULL | |
| BillDuration | mediumint(8) unsigned | NO | | NULL | |
| CallMinimum | tinyint(3) unsigned | NO | | NULL | |
| CallIncrement | tinyint(3) unsigned | NO | | NULL | |
| BasePrice | float(12,9) | NO | | NULL | |
| CallPrice | float(12,9) | NO | | NULL | |
| TransactionId | varchar(20) | NO | | NULL | |
| CustomerIP | varchar(15) | NO | | NULL | |
| ANI | varchar(20) | NO | | NULL | |
| ANIState | varchar(10) | NO | | NULL | |
| DNIS | varchar(20) | NO | | NULL | |
| LRN | varchar(20) | NO | | NULL | |
| DNISState | varchar(10) | NO | | NULL | |
| DNISLATA | varchar(10) | NO | | NULL | |
| DNISOCN | varchar(10) | NO | | NULL | |
| OrigTier | varchar(10) | NO | | NULL | |
| TermRateDeck | varchar(20) | NO | | NULL | |
+---------------+-----------------------+------+-----+---------+----------------+
I have the following indexes:
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs | 0 | PRIMARY | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | id | 1 | id | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 1 | StartTime | A | 269622 | NULL | NULL | | BTREE | | |
| cdrs | 1 | call_time_index | 2 | StopTime | A | 269622 | NULL | NULL | | BTREE | | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The query I am running is this:
SELECT MAX(cnt) AS max_channels FROM
(SELECT cl1.StartTime, COUNT(*) AS cnt
FROM cdrs cl1
INNER JOIN cdrs cl2
ON cl1.StartTime
BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id)
AS counts;
It seems like I might have to chunk this data for each day and store the results in a separate table like simultaneous_calls.
I'm sure you want to know not only the maximum simultaneous calls, but when that happened.
I would create a table containing the timestamp of every individual minute
CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute . . .
Then join that to the calls.
SELECT ts, COUNT(*) AS count FROM times
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime
GROUP BY ts ORDER BY count DESC LIMIT 1;
Here's the result in my test (MySQL 5.6.17 on a Linux VM running on a Macbook Pro):
+---------------------+----------+
| ts | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 | 1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)
This achieves several goals:
Reduces the number of rows examined by two orders of magnitude.
Reduces the execution time from 3 hours+ to about 1 minute.
Also returns the actual timestamp when the highest count was found.
Here's the EXPLAIN for my query:
explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | times | index | PRIMARY | PRIMARY | 5 | NULL | 1440 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | cdrs | ALL | starttime | NULL | NULL | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
Notice the figures in the rows column, and compare to the EXPLAIN of your original query. You can estimate the total number of rows examined by multiplying these together (but that gets more complicated if your query is anything other than SIMPLE).
The inline view isn't strictly necessary. (You're right about a lot of time to run the EXPLAIN on the query with the inline view, the EXPLAIN will materialize the inline view (i.e. run the inline view query and populate the derived table), and then give an EXPLAIN on the outer query.
Note that this query will return an equivalent result:
SELECT COUNT(*) AS max_channels
FROM cdrs cl1
JOIN cdrs cl2
ON cl1.StartTime BETWEEN cl2.StartTime AND cl2.StopTime
GROUP BY cl1.id
ORDER BY max_channels DESC
LIMIT 1
Though it still has to do all the work, and probably doesn't perform any better; the EXPLAIN should run a lot faster. (We expect to see "Using temporary; Using filesort" in the Extra column.)
The number of rows in the resultset is going to be the number of rows in the table (~250,000 rows), and those are going to need to be sorted, so that's going to be some time there. The bigger issue (my gut is telling me) is that join operation.
I'm wondering if the EXPLAIN (or performance) would be any different if you swapped the cl1 and cl2 in the predicate, i.e.
ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
I'm thinking that, just because I'd be tempted to try a correlated subquery. That's ~250,000 executions, and that's not likely going to be any faster...
SELECT ( SELECT COUNT(*)
FROM cdrs cl2
WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
) AS max_channels
, cl1.StartTime
FROM cdrs cl1
ORDER BY max_channels DESC
LIMIT 11
You could run an EXPLAIN on that, we're still going to see a "Using temporary; Using filesort", and it will also show the "dependent subquery"...
Obviously, adding a predicate on the cl1 table to cut down the number of rows to be returned (for example, checking only the past 15 days); that should speed things up, but it doesn't get you the answer you want.
WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY
(None of my musings here are sure-fire answers to your question, or solutions to the performance issue; they're just musings.)

Change data type from varchar to enum

I have a table for which I've recently changed the type of several columns from varchar to enum (see below). My app queries against this table on both of these columns and, once the type change was made, I have seen serious performance degradation for this query (I've included the query below as well as the explain plan results). I've so far been unable to find a culprit here and was hoping someone had run into this problem and could advise.
desc order_transmission_history;
+--------------------------+--------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+--------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| transmission_id | varchar(255) | YES | | NULL | |
| transmitter_type | varchar(10) | YES | MUL | NULL | |
| initial_attempt_date | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| most_recent_attempt_date | timestamp | NO | | 0000-00-00 00:00:00 | |
| most_recent_status | varchar(16) | YES | | NULL | |
+--------------------------+--------------+------+-----+---------------------+----------------+
the index is: KEY transmission_history_transmitter_status_date (transmitter_type,most_recent_status,initial_attempt_date)
explain SELECT * FROM order_transmission_history where transmitter_type = 'FAX_1' AND transmission_id = '' AND (most_recent_status is null or (most_recent_status not in ('SENT', 'ERROR')));
+----+-------------+----------------------+-------+-------------------------------------------------------------------------------------------+----------------------------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+-------------------------------------------------------------------------------------------+----------------------------------------------+---------+------+------+-------------+
| 1 | SIMPLE | transmission_history | range | transmission_history_transmitter_status_date | transmission_history_transmitter_status_date | 32 | NULL | 350 | Using where |
+----+-------------+----------------------+-------+-------------------------------------------------------------------------------------------+----------------------------------------------+---------+------+------+-------------+
Now, with the changed data types:
+--------------------------------------+----------------------------------------------------------------------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------------+----------------------------------------------------------------------------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| transmission_id | varchar(255) | YES | | NULL | |
| initial_attempt_date | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| most_recent_attempt_date | timestamp | YES | | NULL | |
| transmitter_type | enum('FAX_1','FAX_2','FAX_3','EMAIL') | YES | MUL | NULL | |
| most_recent_status | enum('NONE','PENDING','TRANSIENT_ERROR','ERROR','SENDING','SENT','SYSTEM_ERROR') | YES | | NULL | |
+--------------------------------------+----------------------------------------------------------------------------------+------+-----+-------------------+----------------+
explain SELECT * FROM order_transmission_history where transmitter_type = 'FAX_1' AND transmission_id = '' AND (most_recent_status is null or (most_recent_status not in ('SENT', 'ERROR')));
+----+-------------+----------------------------+------+----------------------------------------------+----------------------------------------------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------+------+----------------------------------------------+----------------------------------------------+---------+-------+--------+-------------+
| 1 | SIMPLE | order_transmission_history | ref | transmission_history_transmitter_status_date | transmission_history_transmitter_status_date | 2 | const | 394992 | Using where |
+----+-------------+----------------------------+------+----------------------------------------------+----------------------------------------------+---------+-------+--------+-------------+
Since you using in your select
most_recent_status is null OR (most_recent_status not in ('SENT', 'ERROR'))
planner will not use your key. Becuase there are no way use key with such where clause.
So only thing it can use is
transmitter_type = 'FAX_1' AND transmission_id = ''
but in your case planner think you have alot of lines with those values in index, so no advantage use index.
You can force use of index, but not think it will help. Probably you need think about way of rewrite your query to be more specific(for example add "order by most_recent_attempt_date limit 10" and create key with most_recent_attempt_date first)
Also you can got more perfomance if not use null value in most_recent_status(put 'undefined' in enum) and use query which use status value instead of query which user sets ( in/not in).
It has been my experience that MySQL does not like to use an index when you have a where not in or <> on an enum field. Try flipping your query to check for explicit values instead.
SELECT * FROM order_transmission_history
where transmitter_type = 'FAX_1'
AND transmission_id = ''
AND (most_recent_status is null or
(most_recent_status in
('NONE','PENDING','TRANSIENT_ERROR','SENDING','SYSTEM_ERROR')));