why inner join on condition does not use key - mysql

I am creating a table as
create table temp_test2 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
create table temp_test1 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id) limit 3;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | t | ALL | date_id | NULL | NULL | NULL | 4 | NULL |
| 1 | SIMPLE | t2 | ALL | date_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
why the code_id key is not used in both the table, but when I use code_id=something in on condition it's using the key,
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id and t.date_id =1) limit 3;
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | t | const | PRIMARY,date_id,date_id_2,date_id_3 | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | t2 | ref | date_id,date_id_2,date_id_3 | date_id | 4 | const | 1 | NULL |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
I tried (unique,composite primary,composite) key also but it is not working.
Can anyone explain why this so?

Because your tables contain a very small number of records, the optimiser decides that it is not worth using the index. A table scan will do just as good.
Also, you selected all fields (SELECT *), if it used the index for executing the JOIN a row scan would still be required to get the full contents.
The query would be more likely to use the index if:
you selected only the date_id field
there were more than 4 rows in temp_test

Related

Mysql Innodb count(*) performace

CREATE TABLE `app_user` (
`uid` int NOT NULL,
`uname` varchar(20) NOT NULL,
`upwd` varchar(20) DEFAULT NULL,
PRIMARY KEY (`uid`),
UNIQUE KEY `uname` (`uname`)
) ENGINE=InnoDB
This is table sql i used to test, and i insert a million records into it. When i use the following sql to count row. It will cost 20 seconds to execute.
select count(*) from app_user;
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | app_user | NULL | index | NULL | uid | 4 | NULL | 996948 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
In this case, all records' uid are greater than 0. So i can use the sql like this to replace the first sql:
select count(*) from app_user where uid > 0; // In this case, all uid > 0
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | app_user | NULL | range | PRIMARY,uid | PRIMARY | 4 | NULL | 498474 | 100.00 | Using where; Using index |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
It just cost 500 milliseconds. Why does this happen?
I want to know why the second sql execute so fast.

Debugging Slow mySQL query with Explain

Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';

SQL MAX and GROUP BY with WHERE

Given the following table:
CREATE TABLE `test` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`device_id` INT(11) UNSIGNED NOT NULL,
`distincted` BIT(1) NOT NULL DEFAULT b'0',
`timestamp_detected` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `idx1` (`device_id`),
INDEX `idx2` (`device_id`, `timestamp_detected`),
CONSTRAINT `test_ibfk_1` FOREIGN KEY (`device_id`) REFERENCES `device` (`id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
ROW_FORMAT=COMPACT;
I want to perform a groupwise max on timestamp_detected grouped by device_id with the following:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id FROM test as lh1,
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id FROM test GROUP BY device_id) as lh2
WHERE lh1.timestamp_detected = lh2.max_timestamp_detected
AND lh1.device_id = lh2.device_id;
This yields the following results when run with explain:
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using where |
| 1 | PRIMARY | lh1 | ref | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 9 | lh2.device_id,lh2.max_timestamp_detected | 1 | Using index |
| 2 | DERIVED | test | range | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 4 | NULL | 15 | Using index for group-by |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
Now there is a requirement that only those rows with distincted = 1 should be included in the results. I modified the query to the following:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id FROM test as lh1,
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id FROM test WHERE distincted = 1 GROUP BY device_id) as lh2
WHERE lh1.timestamp_detected = lh2.max_timestamp_detected
AND lh1.device_id = lh2.device_id;
It returns the results correctly however it seems to take longer. Running an explain yields the following:
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 860 | Using where |
| 1 | PRIMARY | lh1 | ref | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 9 | lh2.device_id,lh2.max_timestamp_detected | 1 | Using index |
| 2 | DERIVED | test | index | FK_location_history_device,device_id_timestamp_detected | FK_location_history_device | 4 | NULL | 860 | Using where |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
I tried adding the distincted column to index idx2 to no avail. How can I optimize this query?
The query is:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id
FROM test lh1 JOIN
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id
FROM test
WHERE distincted = 1
GROUP BY device_id
) as lh2
on lh1.timestamp_detected = lh2.max_timestamp_detected AND
lh1.device_id = lh2.device_id;
For this query, I would suggest indexes on test(distincted, device_id, time_stamp_detected) and test(device_id, timestamp_detected).
I also wonder if you would get better performance with this equivalent query:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id
FROM test lh1
WHERE distincted = 1 AND
NOT EXISTS (SELECT 1
FROM test t
WHERE t.distincted = 1 AND
t.device_id = lh1.device_id AND
t.timestamp_detected > lh1.timestamp_detected
);
And these two indexes: test(distincted) and test(device_id, timestamp_detected, distincted).

Why is MySQL not using this index with a GROUP BY query?

I have this big table (ca million records) and I'm trying to retrieve the last record of each type.
The table, the index and the query are very simple, and the fact that MySQL is not using the index means I must be overlooking something.
The table looks like this:
CREATE TABLE `MyTable001` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`TypeField` int(11) NOT NULL,
`Value` bigint(20) NOT NULL,
`Timestamp` bigint(20) NOT NULL,
`AnotherField1` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_MyTable001_TypeField` (`TypeField`),
KEY `idx_MyTable001_Timestamp` (`Timestamp`)
) ENGINE=MyISAM
Show Index gives this:
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| MyTable001 | 0 | PRIMARY | 1 | id | A | 626141 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_TypeField | 1 | TypeField | A | 458 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_Timestamp | 1 | Timestamp | A | 156535 | NULL | NULL | | BTREE | | |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
But when I execute EXPLAIN for the following query:
SELECT *
FROM MyTable001
GROUP BY TypeField
ORDER BY id DESC
The result is this:
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | MyTable001 | ALL | NULL | NULL | NULL | NULL | 626141 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
Why won't MySQL use idx_MyTable001_TypeField?
Thanks in advance.
The problem is that the content of the fields not in the group by are still being inspected. Therefore, all rows must be read, and it's better to do a full table scan. This is clearly seen with the following examples:
SELECT TypeField, COUNT(*) FROM MyTable001 GROUP BY TypeField uses the index.
SELECT TypeField, COUNT(id) FROM MyTable001 GROUP BY TypeField does not.
The original query was incorrect. The correct query is:
SELECT l.*
FROM MyTable001 l
JOIN (
SELECT MAX(id) m_id
FROM MyTable001 l
GROUP BY l.TypeField) l_id ON l_id.m_id = l.id;
It takes 260ms in a table with 630k records. Joachim Isaksson's and fancyPants' alternatives took several minutes in my tests.

Order by not picking up the right indexes

I have projects table and each project has multiple categories assigned. The category mapping is stored in the project_category table. I want to list all recent projects that are not expired. Here is the schema, indexes and query.
Schema
Create table projects (
project_id Bigint UNSIGNED NOT NULL AUTO_INCREMENT,
project_title Varchar(300) NOT NULL,
date_added Datetime NOT NULL,
is_expired Bit(1) NOT NULL DEFAULT false,
Primary Key (project_id)) ENGINE = InnoDB;
Create table project_category (
project_category_id Int UNSIGNED NOT NULL AUTO_INCREMENT,
cat_id Int UNSIGNED NOT NULL,
project_id Bigint UNSIGNED NOT NULL,
Primary Key (project_category_id)) ENGINE = InnoDB;
Indexes
CREATE INDEX project_listing (is_expired, date_added) ON projects;
Create INDEX category_mapping_IDX ON project_category (project_id,cat_id);
Query
mysql> EXPLAIN
SELECT P.project_id
FROM projects P
INNER JOIN project_category C USING (project_id)
WHERE P.is_expired=false
AND C.cat_id=17
ORDER BY P.date_added DESC LIMIT 27840,10;
+----+-------------+-------+--------+--------------------------------------------+---------+---------+-------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------+---------+---------+-------------------------+--------+---------------------------------+
| 1 | SIMPLE | C | ref | project_id,cat_id,category_mapping_IDX | cat_id | 4 | const | 185088 | Using temporary; Using filesort |
| 1 | SIMPLE | P | eq_ref | PRIMARY,is_expired_INX,project_listing_IDX | PRIMARY | 8 | freelancer.C.project_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------------------------+---------+---------+-------------------------+--------+---------------------------------+
I am wondering why MySQL isn't using the index on project_category, and why it is doing a full sort?
I also tried the following query just to avoid file sorting, but it is not working either.
mysql> EXPLAIN
SELECT P.project_id
FROM projects P,
(
SELECT P.project_id
FROM projects P
INNER JOIN project_category C USING (project_id)
WHERE C.cat_id=17
) F
WHERE F.project_id=P.project_id
AND P.is_expired=FALSE
LIMIT 10;
+----+-------------+------------+--------+--------------------------------------------+---------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------------------------+---------+---------+-------------------------+--------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 110920 | |
| 1 | PRIMARY | P | eq_ref | PRIMARY,is_expired_INX,project_listing_IDX | PRIMARY | 8 | F.project_id | 1 | Using where |
| 2 | DERIVED | C | ref | project_id,cat_id,category_mapping_IDX | cat_id | 4 | | 185088 | |
| 2 | DERIVED | P | eq_ref | PRIMARY | PRIMARY | 8 | freelancer.C.project_id | 1 | Using index |
+----+-------------+------------+--------+--------------------------------------------+---------+---------+-------------------------+--------+-------------+
Your problem is here:
Create INDEX category_mapping_IDX ON project_category (project_id,cat_id);
This index is not useful when you're trying to subselect a single cat_id, because cat_id is not the first part of the index. Think of the index as a concatenated string and you can see why it can not be used. Swap the order:
Create INDEX category_mapping_IDX ON project_category (cat_id, project_id);