I have a following table
CREATE TABLE `test_series_analysis_data` (
`email` varchar(255) NOT NULL,
`mappingId` int(11) NOT NULL,
`packageId` varchar(255) NOT NULL,
`sectionName` varchar(255) NOT NULL,
`createdAt` datetime(3) DEFAULT NULL,
`marksObtained` float NOT NULL,
`updatedAt` datetime DEFAULT NULL,
`testMetaData` longtext,
PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
KEY `mapping_package` (`mappingId`,`packageId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Following is the output of the explain for the queries:
explain select rank
from (
select email, #i:=#i+1 as rank
from test_series_analysis_data ta
join (select #i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | rank_index,mapping_package | rank_index | 4 | const | 20160 | 1.00 | Using where; Using index |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
Query optimizer could have used both indexes but rank_index is a covering index so it got picked. What surprises me is the output of the following query:
explain select rank
from (
select email, #i:=#i+1 as rank
from test_series_analysis_data ta use index (mapping_package)
join (select #i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | mapping_package | mapping_package | 4 | const | 19434 | 1.00 | Using index condition |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
Why are there rows lesser (19434<20160) when the index being used is mapping_package. rank_index can better select what is required so the row count should be lesser in rank_index.
So does this mean mapping_package index is better than rank_index for the given query?
Does it have any effect that sectionName is a varchar so both indexes should give similar performance?
Also I am assuming Using index condition is selecting only few rows from index and scanning some more. While in case Using where; Using index, optimizer has to only read the index and not table to get rows and then it is selecting some data. Then why Using where is missing while using rank_index?
Moreover why is the key_len for mapping_package is 4 when there are only two columns in the index?
Help appreciated.
(19434<20160) -- Both of those numbers are estimates. It is unusual for them to be that close. I'll bet if you did ANALYZE TABLE, both would change, possibly changing the inequality.
Notice something else: Using where; Using index versus Using index condition.
But first, let me remind you that, in InnoDB, the PRIMARY KEY columns are tacked onto the secondary key. So, effectively you have
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`,`email`)
KEY `mapping_package` (`mappingId`,`packageId`,`email`,`sectionName`)
Now let's decide what the optimal index should be:
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
First, the = parts of WHERE: mappingId, packageId, sectionName, in any order;
Then the ORDER BY column(s): marksObtained
Bonus: Finally if email (the only other column mentioned anywhere in the SELECT) is in the key, it will be "Covering".
This says that rank_index is "perfect", and the other index is not so good. Alas, EXPLAIN does not clearly say that.
You, too, could have figured this out -- all you needed is to study my blog: http://mysql.rjweb.org/doc.php/index_cookbook_mysql (Sorry; it's getting late, and I am getting cheeky.)
Other tips:
Don't blindly use (255). When a tmp table is needed, this can make the the tmp table bigger, hence less efficient. Lower the limit to something reasonable. Or...
If this is a huge table, you really ought to 'normalize' the strings, replacing them with maybe a 2-byte SMALLINT UNSIGNED. This will improve performance in other ways, such as decreasing costly I/O. (OK, 20 rows is pretty small, so this may not apply.)
Why is key_len 4? That implies that one column was used, namely the 4-byte INT mappingId. I would have expected it to use the second column, too. So, I am stumped. EXPLAIN FORMAT=JSON SELECT ... may provide more clues.
Related
MySQL version: 5.7.23
Engine: InnoDB
I created an application that monitors network devices from around the world with ICMP echo request packets. It pings devices on a regular interval and stores the results in a MySQL table.
I have a query that fetches the latest 100 up/down events for a given device, but it takes ~38 seconds to execute, which is way too long. I'm trying to optimize the query but I'm kind of lost.
The query:
select
c.id as clusterId,
c.name as cluster,
m.id as machineId,
m.label as machine,
h.id as pingResultId,
h.timePinged as `timestamp`,
h.status
from pinger_history h
join pinger_history_updown ud on ud.pingResultId = h.id
join pinger_machine_ip_addresses i on h.machineIpId = i.id
join pinger_machines m on i.machineId = m.id
join pinger_clusters c on m.clusterId = c.id
where h.deviceId = ?
order by h.id desc
limit 100
Explain query output:
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------------+--------+----------+----------------------------------------------+
| 1 | SIMPLE | ud | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 111239 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | h | NULL | eq_ref | PRIMARY,deviceId,machineIpId | PRIMARY | 4 | dashboard.ud.pingResultId | 1 | 5.00 | Using where |
| 1 | SIMPLE | i | NULL | eq_ref | PRIMARY,machineId | PRIMARY | 4 | dashboard.h.machineIpId | 1 | 100.00 | NULL |
| 1 | SIMPLE | m | NULL | eq_ref | PRIMARY,clusterId | PRIMARY | 4 | dashboard.i.machineId | 1 | 100.00 | Using where |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dashboard.m.clusterId | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------------+--------+----------+----------------------------------------------+
The pinger_history table consists of around 483,750,000 rows and pinger_history_updown around 115,520 rows. The other tables are small in comparison (less than 300 rows).
If anyone has experience in optimizing queries or debugging bottlenecks then all help will be greatly appreciated.
Edit:
I added the missing order by h.id desc to the query and I made pinger_history the first table in the query.
Here are the create table queries for pinger_history and pinger_history_updown:
pinger_history:
mysql> show create table pinger_history;
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pinger_history | CREATE TABLE `pinger_history` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`deviceId` int(10) unsigned NOT NULL,
`machineIpId` int(10) unsigned NOT NULL,
`minRoundTripTime` decimal(6,1) unsigned DEFAULT NULL,
`maxRoundTripTime` decimal(6,1) unsigned DEFAULT NULL,
`averageRoundTripTime` decimal(6,1) unsigned DEFAULT NULL,
`packetLossRatio` decimal(3,2) unsigned DEFAULT NULL,
`timePinged` datetime NOT NULL,
`status` enum('Up','Unstable','Down') DEFAULT NULL,
`firstOppositeStatusPingResultId` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `deviceId` (`deviceId`),
KEY `machineIpId` (`machineIpId`),
KEY `timePinged` (`timePinged`),
KEY `firstOppositeStatusPingResultId` (`firstOppositeStatusPingResultId`),
CONSTRAINT `pinger_history_ibfk_2` FOREIGN KEY (`machineIpId`) REFERENCES `pinger_machine_ip_addresses` (`id`),
CONSTRAINT `pinger_history_ibfk_4` FOREIGN KEY (`deviceId`) REFERENCES `pinger_devices` (`id`) ON DELETE CASCADE,
CONSTRAINT `pinger_history_ibfk_5` FOREIGN KEY (`firstOppositeStatusPingResultId`) REFERENCES `pinger_history` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=483833283 DEFAULT CHARSET=utf8mb4 |
+----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
pinger_history_updown:
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pinger_history_updown | CREATE TABLE `pinger_history_updown` (
`pingResultId` int(10) unsigned NOT NULL,
`notified` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`pingResultId`),
CONSTRAINT `pinger_history_updown_ibfk_1` FOREIGN KEY (`pingResultId`) REFERENCES `pinger_history` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Edit 2:
Here is the output of show index for pinger_history:
mysql> show index from pinger_history;
+----------------+------------+---------------------------------+--------------+---------------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+---------------------------------+--------------+---------------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pinger_history | 0 | PRIMARY | 1 | id | A | 443760800 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | deviceId | 1 | deviceId | A | 288388 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | machineIpId | 1 | machineIpId | A | 71598 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | timePinged | 1 | timePinged | A | 38041236 | NULL | NULL | | BTREE | | |
| pinger_history | 1 | firstOppositeStatusPingResultId | 1 | firstOppositeStatusPingResultId | A | 8973 | NULL | NULL | YES | BTREE | | |
+----------------+------------+---------------------------------+--------------+---------------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Edit 3:
Here is the explain output when I add straight_join:
Note that the query takes almost 2 minutes with straight_join but around 36 seconds without.
+----+-------------+-------+------------+--------+------------------------------+----------+---------+-------------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------+----------+---------+-------------------------+--------+----------+-------------+
| 1 | SIMPLE | h | NULL | ref | PRIMARY,deviceId,machineIpId | deviceId | 4 | const | 344062 | 100.00 | Using where |
| 1 | SIMPLE | ud | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dashboard.h.id | 1 | 100.00 | Using index |
| 1 | SIMPLE | i | NULL | eq_ref | PRIMARY,machineId | PRIMARY | 4 | dashboard.h.machineIpId | 1 | 100.00 | NULL |
| 1 | SIMPLE | m | NULL | eq_ref | PRIMARY,clusterId | PRIMARY | 4 | dashboard.i.machineId | 1 | 100.00 | Using where |
| 1 | SIMPLE | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | dashboard.m.clusterId | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------+----------+---------+-------------------------+--------+----------+-------------+
create index on following columns
pingResultId ,machineIpId ,clusterId .pinger_clusters .id,
pinger_machine_ip_addresses.id,pinger_history.id,pinger_machines.id,deviceId ,
pinger_history_updown.pingResultId
indexing will reduce the time taken to fetch the data
You wrote that your query fetches the latest 100 events for device, but there is no ORDER BY clause in your SQL.
Add ORDER BY h.id DESCto your query and create composite index on (devideId, id) fields.
I would add the "STRAIGHT_JOIN" keyword and also put the pinger_history into the first position. Then, include an index on pinger_history by the DeviceID to optimize the WHERE clause. Your other tables would probably already have an index on their respective ID keys implied and should be good. The STRAIGHT_JOIN clause tells MySQL to run the query in the table/join order I gave you, don't imply something else.
select STRAIGHT_JOIN
c.id as clusterId,
c.name as cluster,
m.id as machineId,
m.label as machine,
h.id as pingResultId,
h.timePinged as `timestamp`,
h.status
from
pinger_history h
join pinger_history_updown ud
on h.id = ud.pingResultId
join pinger_machine_ip_addresses i
on h.machineIpId = i.id
join pinger_machines m
on i.machineId = m.id
join pinger_clusters c
on m.clusterId = c.id
where
h.deviceId = ?
order by
h.id desc
limit 100
Since you DO want the most recent records, I would definitely have and index on your pinger_history table on (DeviceID, ID ) -- change your existing key of DeviceID only and change it to (DeviceID, ID)
This way, the WHERE clause is FIRST optimized to get the Device ID records. By having the ID as part of the index, but in the second position, the ORDER by can utilize that to get the most recent first for you.
Plan A: Get rid of pinger_history_updown and move notified into pinger_history. Perhaps augment status to indicate "CameUp" and "WentDown". Pro: That will make the query much faster since it will be able to use INDEX(deviceId). Con: It makes pinger_history a little bigger; adding columns to a huge table will take time.
Plan B: Add deviceId to pinger_history_updown and have INDEX(deviceID, pingResultId). Pro: Much faster query. Con: Redundant data (deviceid) is frowned on.
Plan C: Add an index hint to force the execution to start with pinger_history. Con: "What helps today may hurt tomorrow." (STRAIGHT_JOIN was tested and found to be slower.)
Plan D: See if ANALYZE TABLE for each table will help. Pro: Quick and cheap. Con: May not help.
Plan E: Change to ORDER BY deviceId DESC, id DESC. Pro: Cheap and easy to try. Con: May not help.
Plan F: In pinger_history, change
PRIMARY KEY (`id`),
KEY `deviceId` (`deviceId`),
to
PRIMARY KEY(deviceId, id),
KEY(id)
This will make the desired rows "clustered" much better. Pros: Much faster. Con: ALTER TABLE will take a long time for that huge table.
Plan G: Assume it is an explode-implode problem and move the LIMIT into a derived table:
select c.id as clusterId, c.name as cluster, m.id as machineId,
m.label as machine, h2.id as pingResultId, h2.timePinged as `timestamp`,
h2.status
FROM
( -- "derived table"
SELECT ud1.pingResultId
FROM pinger_history_updown AS ud1
JOIN pinger_history AS h1 ON ud1.pingResultId = h1.id
WHERE h1.deviceId = ?
ORDER BY ud1.pingResultId
LIMIT 100 -- only needed here
) AS ud2
JOIN pinger_history AS h2 ON ud2.pingResultId = h2.id
join pinger_machine_ip_addresses i ON h.machineIpId = i.id
join pinger_machines m ON i.machineId = m.id
join pinger_clusters c ON m.clusterId = c.id
order by h2.id desc -- Yes, this is repeated
Pro: May make better use of 'covering' INDEX(deviceId), especially if merged with Plan B.
Summary: Start with D and E.
Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
I am creating a table as
create table temp_test2 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
create table temp_test1 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id) limit 3;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | t | ALL | date_id | NULL | NULL | NULL | 4 | NULL |
| 1 | SIMPLE | t2 | ALL | date_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
why the code_id key is not used in both the table, but when I use code_id=something in on condition it's using the key,
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id and t.date_id =1) limit 3;
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | t | const | PRIMARY,date_id,date_id_2,date_id_3 | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | t2 | ref | date_id,date_id_2,date_id_3 | date_id | 4 | const | 1 | NULL |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
I tried (unique,composite primary,composite) key also but it is not working.
Can anyone explain why this so?
Because your tables contain a very small number of records, the optimiser decides that it is not worth using the index. A table scan will do just as good.
Also, you selected all fields (SELECT *), if it used the index for executing the JOIN a row scan would still be required to get the full contents.
The query would be more likely to use the index if:
you selected only the date_id field
there were more than 4 rows in temp_test
I have a table that looks like so:
ID | objectID | time | ...
-------------------
1 | 1 | ...
2 | 1 | ...
3 | 1 | ...
4 | 2 | ...
5 | 2 | ...
6 | 3 | ...
7 | 4 | ...
ID is the primary key, objectID is non-unique.
I am trying to increase performance on a query to get the most recent entries for all objectIDs, but the entries should not be newer than a certain value.
I tried to following two queries, which should both provide the same (and correct results):
SELECT *
FROM (
SELECT *
FROM table
WHERE time <= XXX
ORDER BY time DESC
)
GROUP BY objectID
AND
SELECT *
FROM table AS t
INNER JOIN (
SELECT ID, MAX(time)
FROM table
WHERE time <= 1353143351
GROUP BY objectID
) s
USING (ID)
An EXPLAIN for the first query tells me
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
------------------------------------------------------------------------------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 145827 | Using temporary; Using filesort
2 | DERIVED | tbl_test | ALL | NULL | NULL | NULL | NULL | 238694 | Using filesort
for the second query it says
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
------------------------------------------------------------------------------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 325
1 | PRIMARY | t | eq_ref | PRIMARY,ID | PRIMARY | 4 | s.ID | 1
2 | DERIVED | tbl_test | index | NULL | ID | 12 | NULL | 238694 | Using where; Using index; Using temporary; Using filesort
(tbl_test is my table for testing)
The second query seems to be (much) faster, but still not extreme fast with a runtime of 0.1 secs at ~200k DB entries.
Is there a way to increase performance of the query? Any missing indexes?
Thanks in advance!!
Edit: Explain for eggys query (see query in his post):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 325
1 PRIMARY lab_case_test ALL NULL NULL NULL NULL 238694 Using where; Using join buffer
2 DERIVED lab_case_test index NULL ID 12 NULL 238694 Using where; Using index; Using temporary; Using f...
CREATE TABLE `lab_case_test` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`caseID` int(11) NOT NULL,
`time` int(11) NOT NULL,
// ....
PRIMARY KEY (`ID`),
KEY `ID` (`ID`,`caseID`,`time`),
KEY `caseID` (`caseID`,`time`)
) ENGINE=MyISAM AUTO_INCREMENT=238695 DEFAULT CHARSET=utf8
You want a composite index over (objectID, time):
ALTER TABLE my_table ADD INDEX (objectID, time)
The reason for this is that MySQL can then retrieve the maximum time for each objectID directly from the index tree; it can then also use the same index in joining against the table again to find the groupwise maximum records using something like your second query (but one should join on both objectID and time—I like to use a NATURAL JOIN in cases like this):
SELECT *
FROM my_table NATURAL JOIN (
SELECT objectID, MAX(time) time
FROM my_table
WHERE time <= 1353143351
GROUP BY objectID
) t
According to MySQL docs a composite index will still be used if the leftmost fields are part of the criteria. However, this table will not join correctly with the primary key; I had to add another index of the left two fields which is then used.
One of the tables is memory, and I know that by default memory uses a hash index which can't be used for group/order. However I'm using all rows of the memory table and not the index, so I don't think that relates to the problem.
What am I missing?
mysql> show create table pr_temp;
| pr_temp | CREATE TEMPORARY TABLE `pr_temp` (
`player_id` int(10) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`) USING BTREE,
KEY `insert_date` (`insert_date`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
mysql> show create table player_game_record;
| player_tank_record | CREATE TABLE `player_game_record` (
`player_id` int(10) unsigned NOT NULL,
`game_id` smallint(5) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`,`game_id`),
KEY `insert_date` (`insert_date`),
KEY `player_date` (`player_id`,`insert_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 DATA DIRECTORY='...' INDEX DIRECTORY='...' |
mysql> explain select pgr.* from player_game_record pgr inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY,insert_date,player_date | player_date | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 21 | |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
mysql> explain select pgr.* from player_game_record pgr force index (primary) inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY | PRIMARY | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 2873031 | |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
2 rows in set (0.00 sec)
I think the primary key should work, with the two left columns (player_id, insert_date) being used. However it will use the player_date index by default, and if I force it to use the primary index it looks like it's only using one field rather than both.
Update2: Mysql version 5.5.27-log
Update3:
(note this is after removing the player_date index while trying some other tests)
mysql> show indexes in player_game_record;
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_game_record | 0 | PRIMARY | 1 | player_id | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 2 | insert_date | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 3 | game_id | A | 576276246 | NULL | NULL | | BTREE | | |
| player_game_record | 1 | insert_date | 1 | insert_date | A | 33304 | NULL | NULL | | BTREE | | |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (1.08 sec)
mysql> select count(*) from player_game_record;
+-----------+
| count(*) |
+-----------+
| 576276246 |
+-----------+
1 row in set (0.00 sec)
I agree that your use of the MEMORY storage engine for one of the tables should not at all be an issue here, since we're talking about the other table.
I also agree that the leftmost prefix of an index can be used exactly how you are trying to use it, and I cannot think of any reason why the primary key could not be used in exactly the same way as any other index.
This has been a head-scratcher. The new index you created "should" be the same as the left side of the primary key, so why don't they behave the same way? I have two thoughts, both of which lead me to the same recommendation, even though I am not as familiar with the internals of MyISAM as I am with InnoDB. (As an aside, I'd recommend InnoDB over MyISAM.)
The index on your primary key was presumably on the table when you began inserting data, while the new index was added while most or all of the data was already there. This suggests that your new index is nice and cleanly-organized internally, while your primary key index may be highly fragmented, having been built as the data was loaded.
The row count the optimizer shows is based on index statistics, which may be inaccurate on your primary key due to the insert order.
The fragmentation theory may explain why querying with the primary key as your index is not as fast; the index statistics theory may explain why the optimizer comes up with such a different row count and it may explain why the optimizer might have been choosing a full table scan instead of using that index (which is only a guess, since we don't have the explain available).
The thing I would suggest based on these two thoughts is running OPTIMIZE TABLE on your table. If it took 12 hours to build that new index, then optimizing the table may very possibly take that long or longer.
Possibly helpful: http://www.dbasquare.com/2012/07/09/data-fragmentation-problem-in-mysql-myisam/