SQL MAX and GROUP BY with WHERE

SQL MAX and GROUP BY with WHERE - mysql

Given the following table:
CREATE TABLE `test` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`device_id` INT(11) UNSIGNED NOT NULL,
`distincted` BIT(1) NOT NULL DEFAULT b'0',
`timestamp_detected` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `idx1` (`device_id`),
INDEX `idx2` (`device_id`, `timestamp_detected`),
CONSTRAINT `test_ibfk_1` FOREIGN KEY (`device_id`) REFERENCES `device` (`id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB
ROW_FORMAT=COMPACT;
I want to perform a groupwise max on timestamp_detected grouped by device_id with the following:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id FROM test as lh1,
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id FROM test GROUP BY device_id) as lh2
WHERE lh1.timestamp_detected = lh2.max_timestamp_detected
AND lh1.device_id = lh2.device_id;
This yields the following results when run with explain:
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using where |
| 1 | PRIMARY | lh1 | ref | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 9 | lh2.device_id,lh2.max_timestamp_detected | 1 | Using index |
| 2 | DERIVED | test | range | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 4 | NULL | 15 | Using index for group-by |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+--------------------------+
Now there is a requirement that only those rows with distincted = 1 should be included in the results. I modified the query to the following:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id FROM test as lh1,
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id FROM test WHERE distincted = 1 GROUP BY device_id) as lh2
WHERE lh1.timestamp_detected = lh2.max_timestamp_detected
AND lh1.device_id = lh2.device_id;
It returns the results correctly however it seems to take longer. Running an explain yields the following:
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 860 | Using where |
| 1 | PRIMARY | lh1 | ref | FK_location_history_device,device_id_timestamp_detected | device_id_timestamp_detected | 9 | lh2.device_id,lh2.max_timestamp_detected | 1 | Using index |
| 2 | DERIVED | test | index | FK_location_history_device,device_id_timestamp_detected | FK_location_history_device | 4 | NULL | 860 | Using where |
+----+-------------+------------+-------+---------------------------------------------------------+------------------------------+---------+------------------------------------------+------+-------------+
I tried adding the distincted column to index idx2 to no avail. How can I optimize this query?

The query is:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id
FROM test lh1 JOIN
(SELECT MAX(timestamp_detected) as max_timestamp_detected, device_id
FROM test
WHERE distincted = 1
GROUP BY device_id
) as lh2
on lh1.timestamp_detected = lh2.max_timestamp_detected AND
lh1.device_id = lh2.device_id;
For this query, I would suggest indexes on test(distincted, device_id, time_stamp_detected) and test(device_id, timestamp_detected).
I also wonder if you would get better performance with this equivalent query:
SELECT lh1.id, lh1.timestamp_detected, lh1.device_id
FROM test lh1
WHERE distincted = 1 AND
NOT EXISTS (SELECT 1
FROM test t
WHERE t.distincted = 1 AND
t.device_id = lh1.device_id AND
t.timestamp_detected > lh1.timestamp_detected
);
And these two indexes: test(distincted) and test(device_id, timestamp_detected, distincted).

Related

Mysql Innodb count(*) performace

CREATE TABLE `app_user` (
`uid` int NOT NULL,
`uname` varchar(20) NOT NULL,
`upwd` varchar(20) DEFAULT NULL,
PRIMARY KEY (`uid`),
UNIQUE KEY `uname` (`uname`)
) ENGINE=InnoDB
This is table sql i used to test, and i insert a million records into it. When i use the following sql to count row. It will cost 20 seconds to execute.
select count(*) from app_user;
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | app_user | NULL | index | NULL | uid | 4 | NULL | 996948 | 100.00 | Using index |
+----+-------------+----------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
In this case, all records' uid are greater than 0. So i can use the sql like this to replace the first sql:
select count(*) from app_user where uid > 0; // In this case, all uid > 0
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | app_user | NULL | range | PRIMARY,uid | PRIMARY | 4 | NULL | 498474 | 100.00 | Using where; Using index |
+----+-------------+----------+------------+-------+---------------+---------+---------+------+--------+----------+--------------------------+
It just cost 500 milliseconds. Why does this happen?
I want to know why the second sql execute so fast.

MySQL not using primary index when it had compare with subquery result

I has table with the same schema
CREATE TABLE `stock` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`currency` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`against` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`date` date NOT NULL,
`time` time NOT NULL,
`rate` double(8,4) NOT NULL,
`ask` double(8,4) NOT NULL,
`bid` double(8,4) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `stock_currency_index` (`currency`),
KEY `stock_against_index` (`against`),
KEY `stock_date_index` (`date`),
KEY `stock_time_index` (`time`),
KEY `created_at_index` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=244221 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
When i execute this query mysql has using index
mysql> explain select max(id) from stock group by currency;
+----+-------------+-------+------------+-------+----------------------+----------------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------------+----------------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | stock | NULL | range | stock_currency_index | stock_currency_index | 11 | NULL | 2 | 100.00 | Using index for group-by |
+----+-------------+-------+------------+-------+----------------------+----------------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
Also when i am executing this query mysql has using primary index too
mysql> explain select * from stock where id in (244221, 244222);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | stock | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 2 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
BUT when i am combine these two queries PRIMARY index are not using... i am confused. What i am doing wrong?
mysql> explain select * from stock where id in (select max(id) from stock group by currency);
+----+-------------+-------+------------+-------+----------------------+----------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------------+----------------------+---------+------+--------+----------+--------------------------+
| 1 | PRIMARY | stock | NULL | ALL | NULL | NULL | NULL | NULL | 221800 | 100.00 | Using where |
| 2 | SUBQUERY | stock | NULL | range | stock_currency_index | stock_currency_index | 11 | NULL | 2 | 100.00 | Using index for group-by |
+----+-------------+-------+------------+-------+----------------------+----------------------+---------+------+--------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

First, try rewriting the query as:
select s.*
from stock s join
(select max(id) as maxid
from stock
group by currency
) ss
on ss.maxid = s.id;
Second, I would be tempted to put an index on stock(currency, id) and to use:
select s.*
from stock s
where s.id = (select max(s2.id) from stock s2 where s2.currency = s.currency);
Do either of these perform better?

MySQL confused about IN (CONST vs UNION vs SELECT FROM (UNION))

Can someone please explain why there is big difference between those queries ?
Results of all of them is exactly same.
Performance of query 1: very good, query 2: bad, query 3: good.
Why in query 2 select from table test (id 1) contain all rows ? And why possible_keys not contain PRIMARY which is actually used ?
Table:
CREATE TABLE `test` (
`id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test` ADD PRIMARY KEY (`id`);
Data:
DROP PROCEDURE IF EXISTS insert1000;
DELIMITER $$
CREATE PROCEDURE insert1000()
BEGIN
SET #i = 1;
WHILE #i < 1000 DO
INSERT INTO `test` VALUES (#i);
SET #i = #i + 1;
END WHILE;
END
$$
DELIMITER ;
CALL insert1000();
DROP PROCEDURE insert1000;
Query 1:
SELECT `id` FROM `test` WHERE `id` IN (2, 3)
Query 1 explanation:
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | range | PRIMARY | PRIMARY | 4 | NULL | 2 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
Query 2:
SELECT `id` FROM `test` WHERE `id` IN (SELECT 2 UNION SELECT 3)
Query 2 explanation:
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | PRIMARY | test | index | NULL | PRIMARY | 4 | NULL | 999 | Using where; Using index |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 3 | DEPENDENT UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
Query 3:
SELECT `id` FROM `test` WHERE `id` IN (SELECT * FROM (SELECT 2 UNION SELECT 3) AS `derived`)
Query 3 explanation:
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 1 | PRIMARY | test | eq_ref | PRIMARY | PRIMARY | 4 | derived.2 | 1 | Using where; Using index |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 2 | MATERIALIZED | <derived3> | ALL | NULL | NULL | NULL | NULL | 2 | |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 4 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | NULL | NULL | |
+------+--------------+-------------+--------+---------------+---------+---------+-----------+------+--------------------------+

The Inner workings of the MySQL optimizer...
While query 2 and query 3 both require a full table scan (can't use the index), their different syntax makes the optimizer use different strategies.
You can see it more clearly(ish) by running EXPLAIN EXTENDED SELECT ... and then running SHOW WARNINGS;.
Here's the extended plan for query 2:
select `test`.`id` AS `id`
from `test`
where <in_optimizer>(`test`.`id`,<exists>(select 2 having (<cache>(`test`.`id`) = <ref_null_helper>(2))
union
select 3 having (<cache>(`test`.`id`) = <ref_null_helper>(3))
))
The optimizer translates IN to EXISTS and then compares the results of 2 queries SELECT 2 and SELECT 3 to the row that is scanned in test.
Here's the extended plan for query 3:
select `test`.`id` AS `id`
from `test`
where <in_optimizer>(`test`.`id`,<exists>(select 1 from (select 2 AS `2` union select 3 AS `3`) `derived` where (<cache>(`test`.`id`) = `derived`.`2`)))
You can see that in this case the optimizer is running your original UNION to create a derived table with the values 2 and 3, and then compares this table once to the data it scans in table test.

why inner join on condition does not use key

I am creating a table as
create table temp_test2 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
create table temp_test1 (
date_id int(11) NOT NULL DEFAULT '0',
`date` date NOT NULL,
`day` int(11) NOT NULL,
PRIMARY KEY (date_id)
);
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id) limit 3;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | t | ALL | date_id | NULL | NULL | NULL | 4 | NULL |
| 1 | SIMPLE | t2 | ALL | date_id | NULL | NULL | NULL | 4 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------------------------------+
why the code_id key is not used in both the table, but when I use code_id=something in on condition it's using the key,
explain select * from temp_test as t inner join temp_test2 as t2 on (t2.date_id =t.date_id and t.date_id =1) limit 3;
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | t | const | PRIMARY,date_id,date_id_2,date_id_3 | PRIMARY | 4 | const | 1 | NULL |
| 1 | SIMPLE | t2 | ref | date_id,date_id_2,date_id_3 | date_id | 4 | const | 1 | NULL |
+----+-------------+-------+-------+-------------------------------------+---------+---------+-------+------+-------+
I tried (unique,composite primary,composite) key also but it is not working.
Can anyone explain why this so?

Because your tables contain a very small number of records, the optimiser decides that it is not worth using the index. A table scan will do just as good.
Also, you selected all fields (SELECT *), if it used the index for executing the JOIN a row scan would still be required to get the full contents.
The query would be more likely to use the index if:
you selected only the date_id field
there were more than 4 rows in temp_test

Why is MySQL not using this index with a GROUP BY query?

I have this big table (ca million records) and I'm trying to retrieve the last record of each type.
The table, the index and the query are very simple, and the fact that MySQL is not using the index means I must be overlooking something.
The table looks like this:
CREATE TABLE `MyTable001` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`TypeField` int(11) NOT NULL,
`Value` bigint(20) NOT NULL,
`Timestamp` bigint(20) NOT NULL,
`AnotherField1` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_MyTable001_TypeField` (`TypeField`),
KEY `idx_MyTable001_Timestamp` (`Timestamp`)
) ENGINE=MyISAM
Show Index gives this:
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| MyTable001 | 0 | PRIMARY | 1 | id | A | 626141 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_TypeField | 1 | TypeField | A | 458 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_Timestamp | 1 | Timestamp | A | 156535 | NULL | NULL | | BTREE | | |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
But when I execute EXPLAIN for the following query:
SELECT *
FROM MyTable001
GROUP BY TypeField
ORDER BY id DESC
The result is this:
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | MyTable001 | ALL | NULL | NULL | NULL | NULL | 626141 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
Why won't MySQL use idx_MyTable001_TypeField?
Thanks in advance.

The problem is that the content of the fields not in the group by are still being inspected. Therefore, all rows must be read, and it's better to do a full table scan. This is clearly seen with the following examples:
SELECT TypeField, COUNT(*) FROM MyTable001 GROUP BY TypeField uses the index.
SELECT TypeField, COUNT(id) FROM MyTable001 GROUP BY TypeField does not.
The original query was incorrect. The correct query is:
SELECT l.*
FROM MyTable001 l
JOIN (
SELECT MAX(id) m_id
FROM MyTable001 l
GROUP BY l.TypeField) l_id ON l_id.m_id = l.id;
It takes 260ms in a table with 630k records. Joachim Isaksson's and fancyPants' alternatives took several minutes in my tests.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL MAX and GROUP BY with WHERE - mysql

Related

Mysql Innodb count(*) performace

MySQL not using primary index when it had compare with subquery result

MySQL confused about IN (CONST vs UNION vs SELECT FROM (UNION))

why inner join on condition does not use key

Why is MySQL not using this index with a GROUP BY query?

Categories

Resources