MySQL version: 5.7.23
Engine: InnoDB
I created an application that monitors network devices from around the world with ICMP echo request packets. It pings devices on a regular interval and stores the results in a MySQL table.
I have a query that fetches the latest 100 up/down events for a given device, but it takes ~38 seconds to execute, which is way too long. I'm trying to optimize the query but I'm kind of lost.
The query:
select
c.id as clusterId,
c.name as cluster,
m.id as machineId,
m.label as machine,
h.id as pingResultId,
h.timePinged as `timestamp`,
h.status
from pinger_history h
join pinger_history_updown ud on ud.pingResultId = h.id
join pinger_machine_ip_addresses i on h.machineIpId = i.id
join pinger_machines m on i.machineId = m.id
join pinger_clusters c on m.clusterId = c.id
where h.deviceId = ?
order by h.id desc
limit 100
Explain query output:
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------------+--------+----------+----------------------------------------------+
| 1 | SIMPLE | ud | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 111239 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | h | NULL | eq_ref | PRIMARY,deviceId,machineIpId | PRIMARY | 4 | dashboard.ud.pingResultId | 1 | 5.00 | Using where |
| 1 | SIMPLE | i | NULL | eq_ref | PRIMARY,machineId | PRIMARY | 4 | dashboard.h.machineIpId | 1 | 100.00 | NULL |
| 1 | SIMPLE | m | NULL | eq_ref | PRIMARY,clusterId | PRIMARY | 4 | dashboard.i.machineId | 1 | 100.00 | Using where |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dashboard.m.clusterId | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------------+--------+----------+----------------------------------------------+
The pinger_history table consists of around 483,750,000 rows and pinger_history_updown around 115,520 rows. The other tables are small in comparison (less than 300 rows).
If anyone has experience in optimizing queries or debugging bottlenecks then all help will be greatly appreciated.
Edit:
I added the missing order by h.id desc to the query and I made pinger_history the first table in the query.
Here are the create table queries for pinger_history and pinger_history_updown:
pinger_history:
mysql> show create table pinger_history;
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pinger_history | CREATE TABLE `pinger_history` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`deviceId` int(10) unsigned NOT NULL,
`machineIpId` int(10) unsigned NOT NULL,
`minRoundTripTime` decimal(6,1) unsigned DEFAULT NULL,
`maxRoundTripTime` decimal(6,1) unsigned DEFAULT NULL,
`averageRoundTripTime` decimal(6,1) unsigned DEFAULT NULL,
`packetLossRatio` decimal(3,2) unsigned DEFAULT NULL,
`timePinged` datetime NOT NULL,
`status` enum('Up','Unstable','Down') DEFAULT NULL,
`firstOppositeStatusPingResultId` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `deviceId` (`deviceId`),
KEY `machineIpId` (`machineIpId`),
KEY `timePinged` (`timePinged`),
KEY `firstOppositeStatusPingResultId` (`firstOppositeStatusPingResultId`),
CONSTRAINT `pinger_history_ibfk_2` FOREIGN KEY (`machineIpId`) REFERENCES `pinger_machine_ip_addresses` (`id`),
CONSTRAINT `pinger_history_ibfk_4` FOREIGN KEY (`deviceId`) REFERENCES `pinger_devices` (`id`) ON DELETE CASCADE,
CONSTRAINT `pinger_history_ibfk_5` FOREIGN KEY (`firstOppositeStatusPingResultId`) REFERENCES `pinger_history` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=483833283 DEFAULT CHARSET=utf8mb4 |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
pinger_history_updown:
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pinger_history_updown | CREATE TABLE `pinger_history_updown` (
`pingResultId` int(10) unsigned NOT NULL,
`notified` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`pingResultId`),
CONSTRAINT `pinger_history_updown_ibfk_1` FOREIGN KEY (`pingResultId`) REFERENCES `pinger_history` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Edit 2:
Here is the output of show index for pinger_history:
mysql> show index from pinger_history;
+----------------+------------+---------------------------------+--------------+---------------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+---------------------------------+--------------+---------------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pinger_history | 0 | PRIMARY | 1 | id | A | 443760800 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | deviceId | 1 | deviceId | A | 288388 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | machineIpId | 1 | machineIpId | A | 71598 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | timePinged | 1 | timePinged | A | 38041236 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | firstOppositeStatusPingResultId | 1 | firstOppositeStatusPingResultId | A | 8973 | NULL | NULL | YES | BTREE | | |
+----------------+------------+---------------------------------+--------------+---------------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Edit 3:
Here is the explain output when I add straight_join:
Note that the query takes almost 2 minutes with straight_join but around 36 seconds without.
+----+-------------+-------+------------+--------+------------------------------+----------+---------+-------------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------+----------+---------+-------------------------+--------+----------+-------------+
| 1 | SIMPLE | h | NULL | ref | PRIMARY,deviceId,machineIpId | deviceId | 4 | const | 344062 | 100.00 | Using where |
| 1 | SIMPLE | ud | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dashboard.h.id | 1 | 100.00 | Using index |
| 1 | SIMPLE | i | NULL | eq_ref | PRIMARY,machineId | PRIMARY | 4 | dashboard.h.machineIpId | 1 | 100.00 | NULL |
| 1 | SIMPLE | m | NULL | eq_ref | PRIMARY,clusterId | PRIMARY | 4 | dashboard.i.machineId | 1 | 100.00 | Using where |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dashboard.m.clusterId | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------+----------+---------+-------------------------+--------+----------+-------------+
create index on following columns
pingResultId ,machineIpId ,clusterId .pinger_clusters .id,
pinger_machine_ip_addresses.id,pinger_history.id,pinger_machines.id,deviceId ,
pinger_history_updown.pingResultId
indexing will reduce the time taken to fetch the data
You wrote that your query fetches the latest 100 events for device, but there is no ORDER BY clause in your SQL.
Add ORDER BY h.id DESCto your query and create composite index on (devideId, id) fields.
I would add the "STRAIGHT_JOIN" keyword and also put the pinger_history into the first position. Then, include an index on pinger_history by the DeviceID to optimize the WHERE clause. Your other tables would probably already have an index on their respective ID keys implied and should be good. The STRAIGHT_JOIN clause tells MySQL to run the query in the table/join order I gave you, don't imply something else.
select STRAIGHT_JOIN
c.id as clusterId,
c.name as cluster,
m.id as machineId,
m.label as machine,
h.id as pingResultId,
h.timePinged as `timestamp`,
h.status
from
pinger_history h
join pinger_history_updown ud
on h.id = ud.pingResultId
join pinger_machine_ip_addresses i
on h.machineIpId = i.id
join pinger_machines m
on i.machineId = m.id
join pinger_clusters c
on m.clusterId = c.id
where
h.deviceId = ?
order by
h.id desc
limit 100
Since you DO want the most recent records, I would definitely have and index on your pinger_history table on (DeviceID, ID ) -- change your existing key of DeviceID only and change it to (DeviceID, ID)
This way, the WHERE clause is FIRST optimized to get the Device ID records. By having the ID as part of the index, but in the second position, the ORDER by can utilize that to get the most recent first for you.
Plan A: Get rid of pinger_history_updown and move notified into pinger_history. Perhaps augment status to indicate "CameUp" and "WentDown". Pro: That will make the query much faster since it will be able to use INDEX(deviceId). Con: It makes pinger_history a little bigger; adding columns to a huge table will take time.
Plan B: Add deviceId to pinger_history_updown and have INDEX(deviceID, pingResultId). Pro: Much faster query. Con: Redundant data (deviceid) is frowned on.
Plan C: Add an index hint to force the execution to start with pinger_history. Con: "What helps today may hurt tomorrow." (STRAIGHT_JOIN was tested and found to be slower.)
Plan D: See if ANALYZE TABLE for each table will help. Pro: Quick and cheap. Con: May not help.
Plan E: Change to ORDER BY deviceId DESC, id DESC. Pro: Cheap and easy to try. Con: May not help.
Plan F: In pinger_history, change
PRIMARY KEY (`id`),
KEY `deviceId` (`deviceId`),
to
PRIMARY KEY(deviceId, id),
KEY(id)
This will make the desired rows "clustered" much better. Pros: Much faster. Con: ALTER TABLE will take a long time for that huge table.
Plan G: Assume it is an explode-implode problem and move the LIMIT into a derived table:
select c.id as clusterId, c.name as cluster, m.id as machineId,
m.label as machine, h2.id as pingResultId, h2.timePinged as `timestamp`,
h2.status
FROM
( -- "derived table"
SELECT ud1.pingResultId
FROM pinger_history_updown AS ud1
JOIN pinger_history AS h1 ON ud1.pingResultId = h1.id
WHERE h1.deviceId = ?
ORDER BY ud1.pingResultId
LIMIT 100 -- only needed here
) AS ud2
JOIN pinger_history AS h2 ON ud2.pingResultId = h2.id
join pinger_machine_ip_addresses i ON h.machineIpId = i.id
join pinger_machines m ON i.machineId = m.id
join pinger_clusters c ON m.clusterId = c.id
order by h2.id desc -- Yes, this is repeated
Pro: May make better use of 'covering' INDEX(deviceId), especially if merged with Plan B.
Summary: Start with D and E.
Related
I have a following table
CREATE TABLE `test_series_analysis_data` (
`email` varchar(255) NOT NULL,
`mappingId` int(11) NOT NULL,
`packageId` varchar(255) NOT NULL,
`sectionName` varchar(255) NOT NULL,
`createdAt` datetime(3) DEFAULT NULL,
`marksObtained` float NOT NULL,
`updatedAt` datetime DEFAULT NULL,
`testMetaData` longtext,
PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
KEY `mapping_package` (`mappingId`,`packageId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Following is the output of the explain for the queries:
explain select rank
from (
select email, #i:=#i+1 as rank
from test_series_analysis_data ta
join (select #i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | rank_index,mapping_package | rank_index | 4 | const | 20160 | 1.00 | Using where; Using index |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
Query optimizer could have used both indexes but rank_index is a covering index so it got picked. What surprises me is the output of the following query:
explain select rank
from (
select email, #i:=#i+1 as rank
from test_series_analysis_data ta use index (mapping_package)
join (select #i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | mapping_package | mapping_package | 4 | const | 19434 | 1.00 | Using index condition |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
Why are there rows lesser (19434<20160) when the index being used is mapping_package. rank_index can better select what is required so the row count should be lesser in rank_index.
So does this mean mapping_package index is better than rank_index for the given query?
Does it have any effect that sectionName is a varchar so both indexes should give similar performance?
Also I am assuming Using index condition is selecting only few rows from index and scanning some more. While in case Using where; Using index, optimizer has to only read the index and not table to get rows and then it is selecting some data. Then why Using where is missing while using rank_index?
Moreover why is the key_len for mapping_package is 4 when there are only two columns in the index?
Help appreciated.
(19434<20160) -- Both of those numbers are estimates. It is unusual for them to be that close. I'll bet if you did ANALYZE TABLE, both would change, possibly changing the inequality.
Notice something else: Using where; Using index versus Using index condition.
But first, let me remind you that, in InnoDB, the PRIMARY KEY columns are tacked onto the secondary key. So, effectively you have
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`,`email`)
KEY `mapping_package` (`mappingId`,`packageId`,`email`,`sectionName`)
Now let's decide what the optimal index should be:
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
First, the = parts of WHERE: mappingId, packageId, sectionName, in any order;
Then the ORDER BY column(s): marksObtained
Bonus: Finally if email (the only other column mentioned anywhere in the SELECT) is in the key, it will be "Covering".
This says that rank_index is "perfect", and the other index is not so good. Alas, EXPLAIN does not clearly say that.
You, too, could have figured this out -- all you needed is to study my blog: http://mysql.rjweb.org/doc.php/index_cookbook_mysql (Sorry; it's getting late, and I am getting cheeky.)
Other tips:
Don't blindly use (255). When a tmp table is needed, this can make the the tmp table bigger, hence less efficient. Lower the limit to something reasonable. Or...
If this is a huge table, you really ought to 'normalize' the strings, replacing them with maybe a 2-byte SMALLINT UNSIGNED. This will improve performance in other ways, such as decreasing costly I/O. (OK, 20 rows is pretty small, so this may not apply.)
Why is key_len 4? That implies that one column was used, namely the 4-byte INT mappingId. I would have expected it to use the second column, too. So, I am stumped. EXPLAIN FORMAT=JSON SELECT ... may provide more clues.
Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
I am creating a table as
create table temp_test2 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
create table temp_test1 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id) limit 3;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | t | ALL | date_id | NULL | NULL | NULL | 4 | NULL |
| 1 | SIMPLE | t2 | ALL | date_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
why the code_id key is not used in both the table, but when I use code_id=something in on condition it's using the key,
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id and t.date_id =1) limit 3;
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | t | const | PRIMARY,date_id,date_id_2,date_id_3 | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | t2 | ref | date_id,date_id_2,date_id_3 | date_id | 4 | const | 1 | NULL |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
I tried (unique,composite primary,composite) key also but it is not working.
Can anyone explain why this so?
Because your tables contain a very small number of records, the optimiser decides that it is not worth using the index. A table scan will do just as good.
Also, you selected all fields (SELECT *), if it used the index for executing the JOIN a row scan would still be required to get the full contents.
The query would be more likely to use the index if:
you selected only the date_id field
there were more than 4 rows in temp_test
I have projects table and each project has multiple categories assigned. The category mapping is stored in the project_category table. I want to list all recent projects that are not expired. Here is the schema, indexes and query.
Schema
Create table projects (
project_id Bigint UNSIGNED NOT NULL AUTO_INCREMENT,
project_title Varchar(300) NOT NULL,
date_added Datetime NOT NULL,
is_expired Bit(1) NOT NULL DEFAULT false,
Primary Key (project_id)) ENGINE = InnoDB;
Create table project_category (
project_category_id Int UNSIGNED NOT NULL AUTO_INCREMENT,
cat_id Int UNSIGNED NOT NULL,
project_id Bigint UNSIGNED NOT NULL,
Primary Key (project_category_id)) ENGINE = InnoDB;
Indexes
CREATE INDEX project_listing (is_expired, date_added) ON projects;
Create INDEX category_mapping_IDX ON project_category (project_id,cat_id);
Query
mysql> EXPLAIN
SELECT P.project_id
FROM projects P
INNER JOIN project_category C USING (project_id)
WHERE P.is_expired=false
AND C.cat_id=17
ORDER BY P.date_added DESC LIMIT 27840,10;
+----+-------------+-------+--------+--------------------------------------------+---------+---------+-------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------+---------+---------+-------------------------+--------+---------------------------------+
| 1 | SIMPLE | C | ref | project_id,cat_id,category_mapping_IDX | cat_id | 4 | const | 185088 | Using temporary; Using filesort |
| 1 | SIMPLE | P | eq_ref | PRIMARY,is_expired_INX,project_listing_IDX | PRIMARY | 8 | freelancer.C.project_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------------------------+---------+---------+-------------------------+--------+---------------------------------+
I am wondering why MySQL isn't using the index on project_category, and why it is doing a full sort?
I also tried the following query just to avoid file sorting, but it is not working either.
mysql> EXPLAIN
SELECT P.project_id
FROM projects P,
(
SELECT P.project_id
FROM projects P
INNER JOIN project_category C USING (project_id)
WHERE C.cat_id=17
) F
WHERE F.project_id=P.project_id
AND P.is_expired=FALSE
LIMIT 10;
+----+-------------+------------+--------+--------------------------------------------+---------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------------------------+---------+---------+-------------------------+--------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 110920 | |
| 1 | PRIMARY | P | eq_ref | PRIMARY,is_expired_INX,project_listing_IDX | PRIMARY | 8 | F.project_id | 1 | Using where |
| 2 | DERIVED | C | ref | project_id,cat_id,category_mapping_IDX | cat_id | 4 | | 185088 | |
| 2 | DERIVED | P | eq_ref | PRIMARY | PRIMARY | 8 | freelancer.C.project_id | 1 | Using index |
+----+-------------+------------+--------+--------------------------------------------+---------+---------+-------------------------+--------+-------------+
Your problem is here:
Create INDEX category_mapping_IDX ON project_category (project_id,cat_id);
This index is not useful when you're trying to subselect a single cat_id, because cat_id is not the first part of the index. Think of the index as a concatenated string and you can see why it can not be used. Swap the order:
Create INDEX category_mapping_IDX ON project_category (cat_id, project_id);
(edited) For more details about the app it self, please, also see:
Simple but heavy application consuming a lot of resources. How to Optimize?
(The adopted solution was use both joins and fulltext search)
I have the following query running up to roughly 500.000 rows in 25 seconds. If I remove the ORDER, it takes 0.5 seconds.
Fisrt test
Keeping the ORDER and removing all t. and tu. columns, the query takes 7 seconds.
Second test
If I add or remove an INDEX to the i.created_at field the response time remain the same.
QUERY:
**EDITED: I'VE NOTICED THAT BOTH GROUP BY AND ORDER BY SLOW DOWN THE QUERY (I've also achieve a little gain in the query changing the joins. The gain was to 10secs, but at all, the problem remains). With the modification, the EXPLAIN have stopped to return filesort, but stills returning "using temporary" **
SELECT SQL_NO_CACHE
DISTINCT `i`.`id`,
`i`.`entity`,
`i`.`created_at`,
`i`.`collected_at`,
`t`.`status_id` AS `twt_status_id`,
`t`.`user_id` AS `twt_user_id`,
`t`.`content` AS `twt_content`,
`tu`.`id` AS `twtu_id`,
`tu`.`screen_name` AS `twtu_screen_name`,
`tu`.`profile_image` AS `twtu_profile_image`
FROM `mtrt_items` AS `i`
LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id
LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
LEFT JOIN `twt_users` AS `tu` ON u.id = tu.id
INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
INNER JOIN `account_clients` AS `c` ON g.client_id =c.id
ORDER BY `i`.`created_at` DESC
LIMIT 100 OFFSET 0
Here is the EXPLAIN (EDITED):
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
| 1 | SIMPLE | c | index | PRIMARY | PRIMARY | 4 | NULL | 1 | Using index; Using temporary |
| 1 | SIMPLE | g | ref | PRIMARY,client_id | client_id | 4 | clubr_new.c.id | 3 | Using index |
| 1 | SIMPLE | sg | ref | group_id,search_id | group_id | 4 | clubr_new.g.id | 1 | Using index |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | clubr_new.sg.search_id | 1 | Using index |
| 1 | SIMPLE | r | ref | search_id,item_id | search_id | 4 | clubr_new.s.id | 4359 | Using where |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 8 | clubr_new.r.item_id | 1 | |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 8 | clubr_new.i.user_id | 1 | Using index |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | clubr_new.i.id | 1 | |
| 1 | SIMPLE | tu | eq_ref | PRIMARY | PRIMARY | 8 | clubr_new.u.id | 1 | |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
Here is the mtrt_items table:
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| entity | enum('twitter','facebook','youtube','flickr','orkut') | NO | MUL | NULL | |
| user_id | bigint(20) | NO | MUL | NULL | |
| created_at | datetime | NO | MUL | NULL | |
| collected_at | datetime | NO | | NULL | |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
CREATE TABLE `mtrt_items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`entity` enum('twitter','facebook','youtube','flickr','orkut') COLLATE utf8_unicode_ci NOT NULL,
`user_id` bigint(20) NOT NULL,
`created_at` datetime NOT NULL,
`collected_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `mtrt_user_id` (`user_id`),
KEY `entity` (`entity`),
KEY `created_at` (`created_at`),
CONSTRAINT `mtrt_items_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `mtrt_users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=309650 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The twt_tweets_content is MyISAM and is also used for fulltext searches:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| user_id | int(11) | NO | MUL | NULL | |
| status_id | varchar(100) | NO | MUL | NULL | |
| content | varchar(200) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
Instead of placing the Order By into the main query, wrap it, like so:
SELECT * FROM (
... your query
) ORDER BY `created at`
Take a look at the query plan. You will find that in your case, the sort is performed on your table mtrt_items before the outer join is performed. In the rewrite I've partially provided, the sort is applied after the outer joins, and is applied on a much smaller set.
UPDATE
Assuming that the LIMIT is being applied to a large set (500,000?), it looks like you can perform the top before doing any of the joins.
SELECT * from (
SELECT
`id`, ... `created_at`, ...
ORDER BY `i`.`created_at` DESC
LIMIT 100 OFFSET 0) as i
LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id
LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
LEFT JOIN `twt_users` AS `tu` ON t.user_id = tu.id
INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
INNER JOIN `account_clients` AS `c` ON g.client_id =c.id
GROUP BY i.id
Don't include the VARCHAR/TEXT fields in your initial query. This will create the TEMPORARY table required for the sorting, using the MEMORY engine and this will increase the efficiency dramatically. You can collect the text fields later using another query, without any sorting, simply with a condition on the PRIMARY KEY field and merge the data in your script (assuming that you are using one).
Also get rid of any JOINs (INNER or OUTER) that you don't actually take any data from.