Mysql Innodb count(*) performace - mysql

CREATE TABLE `app_user` (
`uid` int NOT NULL,
`uname` varchar(20) NOT NULL,
`upwd` varchar(20) DEFAULT NULL,
PRIMARY KEY (`uid`),
UNIQUE KEY `uname` (`uname`)
) ENGINE=InnoDB
This is table sql i used to test, and i insert a million records into it. When i use the following sql to count row. It will cost 20 seconds to execute.
select count(*) from app_user;
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | app_user | NULL | index | NULL | uid | 4 | NULL | 996948 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
In this case, all records' uid are greater than 0. So i can use the sql like this to replace the first sql:
select count(*) from app_user where uid > 0; // In this case, all uid > 0
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | app_user | NULL | range | PRIMARY,uid | PRIMARY | 4 | NULL | 498474 | 100.00 | Using where; Using index |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
It just cost 500 milliseconds. Why does this happen?
I want to know why the second sql execute so fast.

Related

MySQL index does not work if I select all fields

I have a simple table like this,
CREATE TABLE `domain` (
`id` varchar(191) NOT NULL,
`time` bigint(20) DEFAULT NULL,
`task_id` bigint(20) DEFAULT NULL,
`name` varchar(512) DEFAULT NULL
PRIMARY KEY (`id`),
KEY `idx_domain_time` (`time`),
KEY `idx_domain_task_id` (`task_id`),
FULLTEXT KEY `idx_domain_name` (`name`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
And indexed like this:
mysql> show index from domain;
+--------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+--------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| domain | 0 | PRIMARY | 1 | id | A | 2036092 | NULL | NULL | | BTREE | | | NO |
| domain | 1 | idx_domain_name | 1 | name | NULL | NULL | NULL | NULL | YES | FULLTEXT |
+--------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
Index is used when I select only the id field:
mysql> explain SELECT id FROM `domain` WHERE task_id = '3';
+------+-------------+--------+------+--------------------+--------------------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+------+--------------------+--------------------+---------+-------+---------+-------------+
| 1 | SIMPLE | domain | ref | idx_domain_task_id | idx_domain_task_id | 9 | const | 1018046 | Using index |
+------+-------------+--------+------+--------------------+--------------------+---------+-------+---------+-------------+
1 row in set (0.00 sec)
When I select all fields, it does not work:
mysql> explain SELECT * FROM `domain` WHERE task_id = '3';
+------+-------------+--------+------+--------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+------+--------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | domain | ALL | idx_domain_task_id | NULL | NULL | NULL | 2036092 | Using where |
+------+-------------+--------+------+--------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT id, name FROM `domain` WHERE task_id = '3';
+------+-------------+--------+------+--------------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+------+--------------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | domain | ALL | idx_domain_task_id | NULL | NULL | NULL | 2036092 | Using where |
+------+-------------+--------+------+--------------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)
What's wrong?
Indexes other than the Primary Key work by storing data for the indexed field(s) in index order, along with the primary key.
So when you SELECT the primary key by the indexed field, there is enough information in the index to completely satisfy the query. When you add other fields, there's no longer enough information in the index. That doesn't mean the database won't use the index, but now it's no longer as much of a slam dunk, and it comes down more to table statistics.
MySql optimizer will try to achieve the best performance so it may ignore an index. You can force optimizer to use the index you want if you are sure that will give you better performance. You can use :
SELECT * FROM `domain` USE INDEX (idx_domain_task_id) WHERE task_id = '3';
For more details please see this page Index Hints .

MySql: Optimizing Subquery with index_merge

I want to lists all users and the status of the newsletter-subscribtion. Since someone doesn't need to be a user if subscribed to the newsletter, I'm doing something like:
SELECT user.id, user.email,
(SELECT newsletter.status FROM newsletter
WHERE newsletter.email=user.email OR newsletter.user = user.id) AS status
FROM user
WHERE...
Indexes are user.id, user.email, newsletter.email, newsletter.user.
The OR makes the query incredible slow. I found here Union as sub query MySQL that you can do an "index merge" which will speed-up the query. But I'm not sure how to force MySQL to do an index merge in my case. Any ideas?
Added: 2nd row of explain doesn't use a key.
DROP TABLE if exists user;
DROP TABLE if exists newsletter;
create table user (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`email` varchar(255) DEFAULT NULL, INDEX email(email)
) ENGINE=InnoDB;
create table newsletter (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`status` enum('subscribed','unsubscribed') DEFAULT NULL,
`email` varchar(255) DEFAULT NULL, INDEX email(email),
`user` int(10) unsigned DEFAULT NULL, INDEX user(user)
) ENGINE=InnoDB;
EXPLAIN SELECT user.id, user.email,
(SELECT newsletter.status FROM newsletter
WHERE newsletter.email=user.email OR newsletter.user = user.id) AS status
FROM user;
+----+--------------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| 1 | PRIMARY | user | NULL | index | NULL | email | 768 | NULL | 1 | 100.00 | Using index |
| 2 | DEPENDENT SUBQUERY | newsletter | NULL | ALL | email,user | NULL | NULL | NULL | 1 | 100.00 | Range checked for each record (index map: 0x6) |
+----+--------------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
EXPLAIN SELECT user.id, user.email, newsletter.status
FROM user
LEFT JOIN newsletter ON (newsletter.email=user.email OR newsletter.user = user.id);
+----+-------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
| 1 | SIMPLE | user | NULL | index | NULL | email | 768 | NULL | 1 | 100.00 | Using index |
| 1 | SIMPLE | newsletter | NULL | ALL | email,user | NULL | NULL | NULL | 1 | 100.00 | Range checked for each record (index map: 0x6) |
+----+-------------+------------+------------+-------+---------------+-------+---------+------+------+----------+------------------------------------------------+
Unless you have switched off index_merge, MySQL will use it if it thinks it will be of benefit. But I find it kicks in less often than one might expect, and even when it does, it's not very helpful — the query is still a lot slower than using an index in the conventional way.
The typical solution for these types of queries is to do a UNION of two simpler queries.
explain select u.id, u.email, n.status
from user u left join newsletter n on u.email=n.email
union
select u.id, u.email, n.status
from user u left join newsletter n on u.id=n.user;
+----+--------------+------------+-------+---------------+-------+---------+--------------+------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+-------+---------+--------------+------+-----------------+
| 1 | PRIMARY | u | index | NULL | email | 403 | NULL | 1 | Using index |
| 1 | PRIMARY | n | ref | email | email | 403 | test.u.email | 1 | NULL |
| 2 | UNION | u | index | NULL | email | 403 | NULL | 1 | Using index |
| 2 | UNION | n | ref | user | user | 5 | test.u.id | 1 | NULL |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+-------+---------------+-------+---------+--------------+------+-----------------+
You'd be better off fixing your data so you only have to join on the user id, not on the email.

MySQL does not use the primary key inside a SELECT ... ORDER BY query. Why?

I have this MySQL table:
CREATE TABLE `maillog` (
`Id` varchar(200) NOT NULL DEFAULT 'Test',
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `email` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=cp1251 ROW_FORMAT=DYNAMIC;
Now, I want to execute this query - but it's very slow for many rows:
SELECT Id FROM `maillog` ORDER BY `Id`;
Why does MySQL not use the primary key?
If I run EXPLAIN for this query, the result shows a NULL value for Key:
mysql> EXPLAIN SELECT Id FROM `maillog` ORDER BY `Id`;
+----+-------------+---------+------------+--------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+--------+---------------+------+---------+------+------+----------+-------+
| 1 | SIMPLE | maillog | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+---------+------------+--------+---------------+------+---------+------+------+----------+-------+
But the DESCRIBE query for this table, shows the Key PRI for the Id column:
mysql> DESCRIBE `maillog`;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| Id | varchar(200) | NO | PRI | Test | |
| email | varchar(255) | YES | MUL | NULL | |
+-------+--------------+------+-----+---------+-------+
I think it is very strange you set a primary key empty string as default.
Even i do not understand how mySql allows it.
Try to quit default value, primary key must be mandatory.
You have only 1 row in your DB, the query engine don't bother loading the index, it just make a sequential read on the table, so yes "no index".
try adding a few more row, this should change (1 line in maillog, 2 in maillog2):
SQL Fiddle
MySQL 5.6 Schema Setup:
Query 1:
explain SELECT Id FROM `maillog` ORDER BY `Id`
Results:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|-------------|---------|--------|---------------|--------|---------|--------|------|--------|
| 1 | SIMPLE | maillog | system | (null) | (null) | (null) | (null) | 1 | (null) |
Query 2:
explain SELECT Id FROM `maillog2` ORDER BY `Id`
Results:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|----|-------------|----------|-------|---------------|---------|---------|--------|------|-------------|
| 1 | SIMPLE | maillog2 | index | (null) | PRIMARY | 202 | (null) | 2 | Using index |

SQL MAX and GROUP BY with WHERE

Given the following table:
CREATE TABLE `test` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`device_id` INT(11) UNSIGNED NOT NULL,
`distincted` BIT(1) NOT NULL DEFAULT b'0',
`timestamp_detected` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `idx1` (`device_id`),
INDEX `idx2` (`device_id`, `timestamp_detected`),
CONSTRAINT `test_ibfk_1` FOREIGN KEY (`device_id`) REFERENCES `device` (`id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
ROW_FORMAT=COMPACT;
I want to perform a groupwise max on timestamp_detected grouped by device_id with the following:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id FROM test as lh1,
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id FROM test GROUP BY device_id) as lh2
WHERE lh1.timestamp_detected = lh2.max_timestamp_detected
AND lh1.device_id = lh2.device_id;
This yields the following results when run with explain:
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using where |
| 1 | PRIMARY | lh1 | ref | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 9 | lh2.device_id,lh2.max_timestamp_detected | 1 | Using index |
| 2 | DERIVED | test | range | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 4 | NULL | 15 | Using index for group-by |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
Now there is a requirement that only those rows with distincted = 1 should be included in the results. I modified the query to the following:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id FROM test as lh1,
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id FROM test WHERE distincted = 1 GROUP BY device_id) as lh2
WHERE lh1.timestamp_detected = lh2.max_timestamp_detected
AND lh1.device_id = lh2.device_id;
It returns the results correctly however it seems to take longer. Running an explain yields the following:
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 860 | Using where |
| 1 | PRIMARY | lh1 | ref | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 9 | lh2.device_id,lh2.max_timestamp_detected | 1 | Using index |
| 2 | DERIVED | test | index | FK_location_history_device,device_id_timestamp_detected | FK_location_history_device | 4 | NULL | 860 | Using where |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
I tried adding the distincted column to index idx2 to no avail. How can I optimize this query?
The query is:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id
FROM test lh1 JOIN
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id
FROM test
WHERE distincted = 1
GROUP BY device_id
) as lh2
on lh1.timestamp_detected = lh2.max_timestamp_detected AND
lh1.device_id = lh2.device_id;
For this query, I would suggest indexes on test(distincted, device_id, time_stamp_detected) and test(device_id, timestamp_detected).
I also wonder if you would get better performance with this equivalent query:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id
FROM test lh1
WHERE distincted = 1 AND
NOT EXISTS (SELECT 1
FROM test t
WHERE t.distincted = 1 AND
t.device_id = lh1.device_id AND
t.timestamp_detected > lh1.timestamp_detected
);
And these two indexes: test(distincted) and test(device_id, timestamp_detected, distincted).

why inner join on condition does not use key

I am creating a table as
create table temp_test2 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
create table temp_test1 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id) limit 3;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | t | ALL | date_id | NULL | NULL | NULL | 4 | NULL |
| 1 | SIMPLE | t2 | ALL | date_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
why the code_id key is not used in both the table, but when I use code_id=something in on condition it's using the key,
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id and t.date_id =1) limit 3;
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | t | const | PRIMARY,date_id,date_id_2,date_id_3 | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | t2 | ref | date_id,date_id_2,date_id_3 | date_id | 4 | const | 1 | NULL |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
I tried (unique,composite primary,composite) key also but it is not working.
Can anyone explain why this so?
Because your tables contain a very small number of records, the optimiser decides that it is not worth using the index. A table scan will do just as good.
Also, you selected all fields (SELECT *), if it used the index for executing the JOIN a row scan would still be required to get the full contents.
The query would be more likely to use the index if:
you selected only the date_id field
there were more than 4 rows in temp_test