MySQL Optimize 500M+ row table - mysql

I have a fairly simple table, containing 500 million+ rows. Very simple queries are taking over 2-3.5 minutes. I have an index on the field in the WHERE statement.
I am wondering what I can do to optimize this table and/or query?
THE QUERY & RESULTS
mysql> SELECT COUNT(emails_id) AS count FROM person_deliveries WHERE DATE(date) = '2021-08-23' ;
+--------+
| count |
+--------+
| 539438 |
+--------+
1 row in set (2 min 20.05 sec)
EXPLAIN QUERY
mysql> EXPLAIN SELECT COUNT(emails_id) AS count FROM person_deliveries WHERE DATE(date) = '2021-08-23' ;
+----+-------------+-------------------+------------+-------+---------------+----------------------+---------+------+-----------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------------+-------+---------------+----------------------+---------+------+-----------+----------+--------------------------+
| 1 | SIMPLE | person_deliveries | NULL | index | NULL | campaigns_id | 4 | NULL | 454956815 | 100.00 | Using where; Using index |
+----+-------------+-------------------+------------+-------+---------------+----------------------+---------+------+-----------+----------+--------------------------+
SHOW CREATE TABLE
-------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| person_deliveries | CREATE TABLE `person_deliveries` (
`emails_id` int unsigned NOT NULL,
`campaigns_id` int NOT NULL,
`date` datetime NOT NULL,
`vmta` varchar(255) DEFAULT NULL,
`ip_address` varchar(15) DEFAULT NULL,
`domain` varchar(255) DEFAULT NULL,
UNIQUE KEY `person_campaign_date` (`emails_id`,`campaigns_id`,`date`),
KEY `ip_address` (`ip_address`),
KEY `domain` (`domain`),
KEY `campaigns_id` (`campaigns_id`),
KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Thank you in advance!

As Silvanu & DRapp commented, my use of the date() function was slowing the query down.
mysql> SELECT COUNT(emails_id) AS count FROM person_deliveries WHERE date >= '2021-08-23 00:00:00' AND date <= '2021-08-23 23:59:59' ;
+--------+
| count |
+--------+
| 539438 |
+--------+
1 row in set (***0.47 sec***)

Related

How to use correct indexes with a double inner join query?

I have a query with 2 INNER JOIN statements, and only fetching a few column, but it is very slow even though I have indexes on all required columns.
My query
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
INNER JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
WHERE
com.prestataireLAD REGEXP '.*'
AND pe_nom REGEXP 'bordeaux|chambéry-annecy|grenoble|lyon|marseille|metz|montpellier|nancy|nice|nimes|rouen|strasbourg|toulon|toulouse|vitry|vitry bis 1|vitry bis 2|vlg'
AND com.date_livraison BETWEEN '2022-06-11 00:00:00'
AND '2022-07-08 00:00:00';
It takes around 20 seconds to compute and fetch 4123 rows.
The problem
In order to find what's wrong and why is it so slow, I've used the EXPLAIN statement, here is the output:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|--------|----------------------------|-------------|---------|------------------------|--------|----------|-------------|
| 1 | SIMPLE | dys | | ALL | id_commande,id_commande_2 | | | | 878588 | 100.00 | Using where |
| 1 | SIMPLE | com | | eq_ref | id_commande,date_livraison | id_commande | 110 | db.dys.id_commande | 1 | 7.14 | Using where |
| 1 | SIMPLE | pe | | ref | pe_id | pe_id | 5 | db.com.code_pe | 1 | 100.00 | Using where |
I can see that the dysfonctionnements JOIN is rigged, and doesn't use a key even though it could...
Table definitions
commandes (included relevant columns only)
CREATE TABLE `commandes` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) NOT NULL DEFAULT '',
`date_commande` datetime NOT NULL,
`date_livraison` datetime NOT NULL,
`code_pe` int(11) NOT NULL,
`traitement_dysfonctionnement` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`),
KEY `date_livraison` (`date_livraison`),
KEY `traitement_dysfonctionnement` (`traitement_dysfonctionnement`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
dysfonctionnements (again, relevant columns only)
CREATE TABLE `dysfonctionnements` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) DEFAULT NULL,
`dysfonctionnement` varchar(150) DEFAULT NULL,
`responsable` varchar(50) DEFAULT NULL,
`reimputation` varchar(50) DEFAULT NULL,
`montant` float DEFAULT NULL,
`listRembArticles` text,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`,`dysfonctionnement`),
KEY `id_commande_2` (`id_commande`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
pe (again, relevant columns only)
CREATE TABLE `pe` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`pe_id` int(11) DEFAULT NULL,
`pe_nom` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pe_nom` (`pe_nom`),
KEY `pe_id` (`pe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Investigation
If I remove the db.pe table from the query and the WHERE clause on pe_nom, the query takes 1.7 seconds to fetch 7k rows, and with the EXPLAIN statement, I can see it is using keys as I expect it to do:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|-------|----------------------------|----------------|---------|------------------------|--------|----------|-----------------------------------------------|
| 1 | SIMPLE | com | | range | id_commande,date_livraison | date_livraison | 5 | | 389558 | 100.00 | Using index condition; Using where; Using MRR |
| 1 | SIMPLE | dys | | ref | id_commande,id_commande_2 | id_commande_2 | 111 | ooshop.com.id_commande | 1 | 100.00 | |
I'm open to any suggestions, I see no reason not to use the key when it does on a very similar query and it definitely makes it faster...
I had a similar experience when MySQL optimiser selected a joined table sequence far from optimal. At that time I used MySQL specific STRAIGHT_JOIN operator to overcome default optimiser behaviour. In your case I would try this:
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
STRAIGHT_JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
Also, in your WHERE clause one of the REGEXP probably might be changed to IN operator, I assume it can use index.
Remove com.prestataireLAD REGEXP '.*'. The Optimizer probably won't realize that this has no impact on the resultset. If you are dynamically building the WHERE clause, then eliminate anything else you can.
id_commande_2 is redundant. In queries where it might be useful, the UNIQUE can take care of it.
These indexes might help:
com: INDEX(date_livraison, id_commande, code_pe)
pe: INDEX(pe_nom, pe_id)

can't improve query performance of query

I have the following 2 tables:
CREATE TABLE table1 (
ID INT(11) NOT NULL AUTO_INCREMENT,
AccountID INT NOT NULL,
Type VARCHAR(50) NOT NULL,
ValidForBilling BOOLEAN NULL DEFAULT false,
MerchantCreationTime TIMESTAMP NOT NULL,
PRIMARY KEY (ID),
UNIQUE KEY (OrderID, Type)
);
with the index:
INDEX accID_type_merchCreatTime_vfb (AccountID, Type, MerchantCreationTime, ValidForBilling);
CREATE TABLE table2 (
OrderID INT NOT NULL,
AccountID INT NOT NULL,
LineType VARCHAR(256) NOT NULL,
CreationDate TIMESTAMP NOT NULL,
CalculatedAmount NUMERIC(4,4) NULL,
table1ID INT(11) NOT NULL
);
I'm running the following query:
SELECT COALESCE(SUM(CalculatedAmount), 0.0) AS CalculatedAmount
FROM table2
INNER JOIN table1 ON table1.ID = table2.table1ID
WHERE table1.ValidForBilling is TRUE
AND table1.AccountID = 388
AND table1.Type = 'TPG_DISCOUNT'
AND table1.MerchantCreationTime >= '2018-11-01T05:00:00'
AND table1.MerchantCreationTime < '2018-12-01T05:00:00';
And it takes about 2 minutes to complete.
I did EXPLAIN in order to try and improve the query performance and got the following output:
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
| 1 | SIMPLE | table1 | NULL | range | PRIMARY,i_fo_merchant_time_account,FO_AccountID_MerchantCreationTime,FO_AccountID_ExecutionTime,FO_AccountID_Type_ExecutionTime,FO_AccountID_Type_MerchantCreationTime,accID_type_merchCreatTime_vfb | accID_type_merchCreatTime_vfb | 61 | NULL | 71276 | 100.00 | Using where; Using index |
| 1 | SIMPLE | table2 | NULL | eq_ref | table1ID,i_oc_fo_id | table1ID | 4 | finance.table1.ID | 1 | 100.00 | NULL |
+----+-------------+------------------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------+----------------------+-------+----------+--------------------------+
I see that I scan 71276 rows in table1 and I can't seem to make this number lower.
Is there an index I can create to improve this query performance?
Move ValidForBilling before MerchantCreationTime in accID_type_merchCreatTime_vfb. You need to do ref lookups =TRUE before range uses in an index.
For table 2, seems to be a table1ID index already and appending CalculatedAmount will be able to be used in the result:
CREATE INDEX tbl1IDCalcAmount (table1ID,CalculatedAmount) ON table2

why prefix index is slower than index in mysql?

table:(quantity:2100W)
CREATE TABLE `prefix` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`number` int(11) NOT NULL,
`string` varchar(750) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_string_prefix10` (`string`(10)),
KEY `idx_string` (`string`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
discrimination:
select count(distinct(left(string,10)))/count(*) from prefix;
+-------------------------------------------+
| count(distinct(left(string,10)))/count(*) |
+-------------------------------------------+
| 0.9999 |
+-------------------------------------------+
result:
select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string <"1505d28b"
243.96s,241.88s
select sql_no_cache count(*) from prefix force index(idx_string)
where string < "1505d28b"
7.96s,7.21s,7.53s
why prefix index is slower than index in mysql?(forgive my broken English)
explain select sql_no_cache count(*) from prefix force index(idx_string_prefix10)
where string < "1505d28b";
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | prefix | NULL | range | idx_string_prefix10 | idx_string_prefix10 | 42 | NULL | 3489704 | 100.00 | Using where |
+----+-------------+--------+------------+-------+---------------------+---------------------+---------+------+---------+----------+-------------+
When you use a prefix index, MySQL has to read from the index and also after reading the index, it has to read the row of data too, to make sure the value is selected by the WHERE condition. That's two reads, and scanning a lot more data.
When you use a non-prefix index, MySQL can read the whole string value from the index, and it knows immediately whether the value is selected by the condition, or if it can be skipped.

mysql huge table query optimization group by

I have huge table with about 40 million rows (GPS tracker positions), recorded every 10 seconds from multiple devices inside company. I want to select only the first row of every minute, so I used group by. The problem is that the table is growing up every 10 seconds, I've tried almost everything, googled many hours. So I decided to ask a question.
I'm using MySQL 5.7.11 InnoDB pool 50GB, server is Xeon X5650 with 64GB RAM.
table structure:
CREATE TABLE `eventData` (
`id` bigint(20) NOT NULL,
`position` point NOT NULL,
`speed` decimal(6,2) DEFAULT NULL,
`time` datetime DEFAULT NULL,
`device_id` int(9) DEFAULT NULL,
`processed` tinyint(1) NOT NULL DEFAULT '0',
`time_m` datetime GENERATED ALWAYS AS ((`time` - interval second(`time`) second)) VIRTUAL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci ROW_FORMAT=DYNAMIC;
ALTER TABLE `eventData`
ADD PRIMARY KEY (`id`),
ADD KEY `time` (`time`),
ADD KEY `device_id` (`device_id`,`processed`),
ADD KEY `time_m` (`time_m`);
SQL:
SELECT e.time, e.time_m, X(e.position) AS lat, Y(e.position) AS lng
FROM eventData AS e
WHERE
e.device_id = 86 AND
e.time BETWEEN '2016-02-29' AND '2016-03-06'
GROUP BY DAY(e.time),HOUR(e.time),MINUTE(e.time);
Explain:
EXPLAIN SELECT e.time, e.time_m, X(e.position) AS lat, Y(e.position) AS lng FROM eventData AS e WHERE e.device_id = 86 AND e.time BETWEEN '2016-02-29' AND '2016-03-06' GROUP BY DAY(e.time),HOUR(e.time),MINUTE(e.time);
+----+-------------+-------+------------+------+----------------+-----------+---------+-------+---------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+----------------+-----------+---------+-------+---------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | e | NULL | ref | time,device_id | device_id | 5 | const | 2122632 | 6.40 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+-------+------------+------+----------------+-----------+---------+-------+---------+----------+---------------------------------------------------------------------+
describe:
DESCRIBE eventData;
+------------------+------------------------+------+-----+---------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------------+------+-----+---------+-------------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| position | point | NO | | NULL | |
| speed | decimal(6,2) | YES | | NULL | |
| time | datetime | YES | MUL | NULL | |
| device_id | int(9) | YES | MUL | NULL | |
| processed | tinyint(1) | NO | | 0 | |
| time_m | datetime | YES | MUL | NULL | VIRTUAL GENERATED |
+------------------+------------------------+------+-----+---------+-------------------+
I've tried:
without group by: ~0.06s
group by day,hour,minute: ~4.76s
group by virtual column (time_m): ~4.92s
group by e.time DIV 500: ~5.02s
I need to achieve better results than 5 seconds. Please help.
You could partition the table. For example by year. This would dramatically increase the performance due to much smaller indexes.
If this is not possible on your environment, try
GROUP BY date_format(e.time,'%d%H%i');
1) You can try composite index (device_id, time)
2) Try to group by virtual field:
SELECT MIN(e.time), e.time_m, X(e.position) AS lat, Y(e.position) AS lng
FROM eventData AS e
WHERE
e.device_id = 86 AND
e.time BETWEEN '2016-02-29' AND '2016-03-06'
GROUP BY e.time_m;

Slow inner join order query

I have a problem with the speed of query. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using index condition; Using where; Using temporary; Using filesort on companies table.
Mysql slow query: INNER JOIN + ORDER BY causes filesort
Slow query:
SELECT * FROM companies
INNER JOIN post_indices
ON companies.post_index_id = post_indices.id
WHERE companies.deleted_at is NULL
ORDER BY post_indices.id
LIMIT 1;
# 1 row in set (5.62 sec)
But if I remove where statement from query it is really fast:
SELECT * FROM companies
INNER JOIN post_indices
ON companies.post_index_id = post_indices.id
ORDER BY post_indices.id
LIMIT 1;
# 1 row in set (0.00 sec)
I've tried using different indexes on companies table:
index_companies_on_deleted_at
index_companeis_on_post_index_id
index_companies_on_deleted_at_and_post_index_id
index_companies_on_post_index_id_and_deleted_at
index_companies_on_deleted_at index is automatically selected by MySQL. Stats for same query using above indexes:
5.6 sec
3.4 sec
8.5 sec
3.5 sec
Any ideas how to improve my query speed? Again said - without where deleted_at is null condition query is instant..
Companies table has 1.3 mil of rows.
PostIndices table has 3k rows.
UPDATE 1:
Order by post_indices.id is used for simplicity since it's indexed already. But it will be used on other columns of join table (post_indices). So sort on companies.post_index_id wont solve this issue
UPDATE 2: for Rick James
Your query takes only 0.04 sec to accomplish. And explain says that index_companies_on_deleted_at_and_post_index_id index is used. So yes, it works better, but this doesn't solve my problem (need to order on post_indices columns, will do this in future, so id post_indices.id used for simplicity of example. In future it will be for example post_indices.city).
My query with WHERE, but without ORDER BY is instant.
UPDATE 3:
EXPLAIN query. Also I noticed that order of indexes matters. index_companies_on_deleted_at index is used if it's higher (created earlier) then index_companies_on_deleted_at_and_post_index_id. Otherwise later index is used. I mean automatically selected by MySQL.
mysql> EXPLAIN SELECT * FROM companies INNER JOIN post_indices ON post_indices.id = companies.post_index_id WHERE companies.deleted_at IS NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+------------+--------+----------------------------------------------------------------------------------------------------------------+-------------------------------+---------+------------------------------------------------------+--------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+----------------------------------------------------------------------------------------------------------------+-------------------------------+---------+------------------------------------------------------+--------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | companies | NULL | ref | index_companies_on_post_index_id,index_companies_on_deleted_at,index_companies_on_deleted_at_and_post_index_id | index_companies_on_deleted_at | 6 | const | 638692 | 100.00 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | post_indices | NULL | eq_ref | PRIMARY | PRIMARY | 4 | enbro_purecrm_eu_development.companies.post_index_id | 1 | 100.00 | NULL |
+----+-------------+--------------+------------+--------+----------------------------------------------------------------------------------------------------------------+-------------------------------+---------+------------------------------------------------------+--------+----------+---------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM companies USE INDEX(index_companies_on_post_index_id) INNER JOIN post_indices ON post_indices.id = companies.post_index_id WHERE companies.deleted_at IS NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+------------+--------+----------------------------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+----------------------------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | companies | NULL | ALL | index_companies_on_post_index_id | NULL | NULL | NULL | 1277385 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | post_indices | NULL | eq_ref | PRIMARY | PRIMARY | 4 | enbro_purecrm_eu_development.companies.post_index_id | 1 | 100.00 | NULL |
+----+-------------+--------------+------------+--------+----------------------------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM companies USE INDEX(index_companies_on_deleted_at_and_post_index_id) INNER JOIN post_indices ON post_indices.id = companies.post_index_id WHERE companies.deleted_at IS NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+------------+--------+-------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------------+--------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+-------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------------+--------+----------+--------------------------------------------------------+
| 1 | SIMPLE | companies | NULL | ref | index_companies_on_deleted_at_and_post_index_id | index_companies_on_deleted_at_and_post_index_id | 6 | const | 638692 | 100.00 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | post_indices | NULL | eq_ref | PRIMARY | PRIMARY | 4 | enbro_purecrm_eu_development.companies.post_index_id | 1 | 100.00 | NULL |
+----+-------------+--------------+------------+--------+-------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------------+--------+----------+--------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
UPDATE 4:
I've removed non related columns:
| companies | CREATE TABLE `companies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`post_index_id` int(11) DEFAULT NULL,
`vat` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`note` text COLLATE utf8_unicode_ci,
`state` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'new',
`deleted_at` datetime DEFAULT NULL,
`lead_list_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_companies_on_vat` (`vat`),
KEY `index_companies_on_post_index_id` (`post_index_id`),
KEY `index_companies_on_state` (`state`),
KEY `index_companies_on_deleted_at` (`deleted_at`),
KEY `index_companies_on_deleted_at_and_post_index_id` (`deleted_at`,`post_index_id`),
KEY `index_companies_on_lead_list_id` (`lead_list_id`),
CONSTRAINT `fk_rails_5fc7f5c6b9` FOREIGN KEY (`lead_list_id`) REFERENCES `lead_lists` (`id`),
CONSTRAINT `fk_rails_79719355c6` FOREIGN KEY (`post_index_id`) REFERENCES `post_indices` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2523518 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
| post_indices | CREATE TABLE `post_indices` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`postal_code` int(11) DEFAULT NULL,
`group_part` int(11) DEFAULT NULL,
`group_number` int(11) DEFAULT NULL,
`group_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3101 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
UPDATE 5:
Another developer tested same query on his local machine with exactly same data set (dump/restore). And he got totally different explain:
mysql> explain SELECT * FROM companies INNER JOIN post_indices ON companies.post_index_id = post_indices.id WHERE companies.deleted_at is NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+-------+----------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------+----------------------------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+----------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------+----------------------------------------------------+------+-----------------------+
| 1 | SIMPLE | post_indices | index | PRIMARY | PRIMARY | 4 | NULL | 1 | NULL |
| 1 | SIMPLE | companies | ref | index_companies_on_post_index_id,index_companies_on_deleted_at,index_companies_on_deleted_at_and_post_index_id | index_companies_on_deleted_at_and_post_index_id | 11 | const,enbro_purecrm_eu_development.post_indices.id | 283 | Using index condition |
+----+-------------+--------------+-------+----------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------+----------------------------------------------------+------+-----------------------+
2 rows in set (0,00 sec)
Same query on his PC is instant. Have no idea why it is happening.. I've also tried to use STRAIGHT_JOIN. When I force post_indices table to be read first by MySQL, it is blazing fast too. But still it is mistery for me, why same query on another machine is fast (mysql -v 5.6.27) and slow on my machine (mysql -v 5.7.10)
So it seems that problem is MySQL using wrong table as first table to read.
Does this work better?
SELECT * FROM companies AS c
INNER JOIN post_indices AS pi
ON c.post_index_id = pi.id
WHERE c.deleted_at is NULL
ORDER BY c.post_index_id -- Note
LIMIT 1;
INDEX(deleted_at, post_index_id) -- note
For that matter, how fast does it run with the WHERE, but without the ORDER BY?
Using the following optimizer hints, should force MySQL to use the plan that your colleague observed:
SELECT * FROM post_indices
STRAIGHT_JOIN companies FORCE INDEX(index_companies_on_deleted_at_and_post_index_id)
ON companies.post_index_id = post_indices.id
WHERE companies.deleted_at is NULL
ORDER BY post_indices.id
LIMIT 1;
If you will be sorting on other columns of post_indices, you will need an index on those columns to make this plan work well.
Note that what is the most optimal plan will depend on how frequent deleted_at is NULL. If deleted_at is frequently NULL, the above plan will be fast. If not, with the above plan one will have to run through many rows of post_indices before a match is found. Note also that for queries with OFFSET, the same plan may not be the most effective.
I think the issue here is that MySQL decides the join order without considering the effects of ORDER BY and LIMIT. In other words, it will choose the join order that it thinks is fastest to execute the full join.
Since there is a restriction on the companies table (deleted_at is NULL), I am not surprised that it will start with this table.