First, I create a simple database with one MyISAM table with an indexed field called feature.
CREATE DATABASE test;
USE test;
CREATE TABLE data(
id INT(11) NOT NULL AUTO_INCREMENT,
feature VARCHAR(64),
PRIMARY KEY (id)
) ENGINE=MyISAM;
INSERT INTO data VALUES (1, 'a'), (2, 'b');
CREATE INDEX data_feature ON data(feature);
Then, when I test a GROUP BY query with a count, it doesn't use the index when the COUNT() is made by id (see Extra column at the end of the EXPLAIN).
mysql> EXPLAIN SELECT feature, COUNT(1) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(*) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(id) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
I have tested it on MySQL Community 8.0.21 and MariaDB 10.3.25.
COUNT(id) obligates the Optimizer to check id for being NOT NULL. The standard pattern is to simply say COUNT(*).
InnoDB probably works differently. But this is because the PRIMARY KEY is implicitly tacked onto the end of any secondary index. That makes INDEX(feature) work like INDEX(feature, id), in which case it would be a "covering" index as indicated by "Using index".
(There is virtually no reason to stick with MyISAM in this decade.)
Related
Assuming I have a table as:
create table any_table (any_column_1 int, any_column_2 varchar(255));
create index any_table_any_column_1_IDX USING BTREE ON any_table (any_column_1);
(Note: Index type should not matter here)
I was wondering if querying any_column with int or string have any impact on performance, i.e. does
select * from any_table where any_column_1 = 12345;
have any differences in terms of performance with this one?
select * from any_table where any_column_1 = '12345';
I have looked around the web and really have not faced this particular case.
It should be fine to do this either way for an indexed integer column. When you compare an integer column to a constant, the constant value is cast to an integer whether you format it as an integer or a string.
You can confirm this with EXPLAIN. In both cases, the EXPLAIN shows that it will use the index (type: ref indicates an index lookup), and the performance will be the same.
mysql> explain select * from any_table where any_column_1 = 12345;
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | any_table | NULL | ref | any_table_any_column_1_IDX | any_table_any_column_1_IDX | 5 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
mysql> explain select * from any_table where any_column_1 = '12345';
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | any_table | NULL | ref | any_table_any_column_1_IDX | any_table_any_column_1_IDX | 5 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
If you had indexed the string column in your example, any_column_2, it would make a difference because the collation of a string column must match the collation of the value you compare it to. A string literal will be cast to a compatible collation by default, so it uses the index:
create index any_table_any_column_2_IDX USING BTREE ON any_table (any_column_2);
mysql> explain select * from any_table where any_column_2 = '12345';
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | any_table | NULL | ref | any_table_any_column_2_IDX | any_table_any_column_2_IDX | 768 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+----------------------------+----------------------------+---------+-------+------+----------+-------+
But an integer literal has no collation, so you get warnings, and the index cannot be used. The EXPLAIN shows type: ALL so it will do a table-scan and that will have poor performance if you query a table with many rows.
mysql> explain select * from any_table where any_column_2 = 12345;
+----+-------------+-----------+------------+------+----------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+----------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | any_table | NULL | ALL | any_table_any_column_2_IDX | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-----------+------------+------+----------------------------+------+---------+------+------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1739 | Cannot use ref access on index 'any_table_any_column_2_IDX' due to type or collation conversion on field 'any_column_2' |
| Warning | 1739 | Cannot use range access on index 'any_table_any_column_2_IDX' due to type or collation conversion on field 'any_column_2' |
| Note | 1003 | /* select#1 */ select `test2`.`any_table`.`any_column_1` AS `any_column_1`,`test2`.`any_table`.`any_column_2` AS `any_column_2` from `test2`.`any_table` where (`test2`.`any_table`.`any_column_2` = 12345) |
+---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
From my previous post I figured out that if I refer multiple columns in a select query I need a compound index, so for my table
CREATE TABLE price (
dt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
marketId INT,
buy DOUBLE,
sell DOUBLE,
PRIMARY KEY (dt, marketId),
FOREIGN KEY fk_price_market(marketId) REFERENCES market(id) ON UPDATE CASCADE ON DELETE CASCADE
) ENGINE=INNODB;
I created the compound index:
CREATE INDEX idx_price_market_buy ON price (marketId, buy, sell, dt);
now the query
select max(dt) from price where marketId=309 and buy>0.3;
executes fast enough within 0.02 sec, but a similar query with the same combination of columns
select max(buy) from price where marketId=309 and dt>'2019-10-29 15:00:00';
takes 0.18 sec that is relatively slow.
descs of these queries look a bit different:
mysql> desc select max(dt) from price where marketId=309 and buy>0.3;
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
| 1 | SIMPLE | price | NULL | range | idx_price_market,idx_price_buy,idx_price_market_buy | idx_price_market_buy | 13 | NULL | 50442 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select max(buy) from price where marketId=309 and dt>'2019-10-29 15:00:00';
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | price | NULL | ref | PRIMARY,idx_price_market,idx_price_market_buy | idx_price_market_buy | 4 | const | 202176 | 50.00 | Using where; Using index |
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
for example, key_len differs. What does this mean?
And the main question: what is the difference between buy and dt columns? Why switching them places in the query affects the performance?
I have a working, nice, indexed SQL query aggregating notes (sum of ints) for all my users and others stuffs. This is "query A".
I want to use this aggregated notes in others queries, say "query B".
If I create a View based on "query A", will the indexes of the original query will be used when needed if I join it in "query B" ?
Is that true for MySQL ? For others flavors of SQL ?
Thanks.
In MySQL, you cannot create an index on a view. MySQL uses indexes of the underlying tables when you query data against the views that use the merge algorithm. For the views that use the temptable algorithm, indexes are not utilized when you query data against the views.
https://www.percona.com/blog/2007/08/12/mysql-view-as-performance-troublemaker/
Here's a demo table. It has a userid attribute column and a note column.
mysql> create table t (id serial primary key, userid int not null, note int, key(userid,note));
If you do an aggregation to get the sum of note per userid, it does an index-scan on (userid, note).
mysql> explain select userid, sum(note) from t group by userid;
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)
If we create a view for the same query, then we can see that querying the view uses the same index on the underlying table. Views in MySQL are pretty much like macros — they just query the underlying table.
mysql> create view v as select userid, sum(note) from t group by userid;
Query OK, 0 rows affected (0.03 sec)
mysql> explain select * from v;
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
2 rows in set (0.00 sec)
So far so good.
Now let's create a table to join with the view, and join to it.
mysql> create table u (userid int primary key, name text);
Query OK, 0 rows affected (0.09 sec)
mysql> explain select * from v join u using (userid);
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | test.u.userid | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
3 rows in set (0.01 sec)
I tried to use hints like straight_join to force it to read v then join to u.
mysql> explain select * from v straight_join u on (v.userid=u.userid);
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
"Using join buffer (Block Nested Loop)" is MySQL's terminology for "no index used for the join." It's just looping over the table the hard way -- by reading batches of rows from start to finish of the table.
I tried to use force index to tell MySQL that type=ALL is to be avoided.
mysql> explain select * from v straight_join u force index(PRIMARY) on (v.userid=u.userid);
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | v.userid | 1 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
Maybe this is using an index for the join? But it's weird that table u is before table t in the EXPLAIN. I'm frankly not sure how to understand what it's doing, given the order of rows in this EXPLAIN report. I would expect the joined table should come after the primary table of the query.
I only put a few rows of data into each table. One might get some different EXPLAIN results with a larger representative sample of test data. I'll leave that to you to try.
We have two tables - the first is relatively big (contact table) 250k rows and the second is small(user table, < 10 rows). On mysql 5.6 version I have next explain result:
EXPLAIN SELECT
o0_.id AS id_0,
o8_.first_name,
o8_.last_name
FROM
contact o0_
LEFT JOIN user o8_ ON o0_.user_owner_id = o8_.id
LIMIT
25 OFFSET 100
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 253030 | Using index |
| 1 | SIMPLE | o8_ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
2 rows in set (0,00 sec)
When i use force index for join:
EXPLAIN SELECT
o0_.id AS id_0,
o8_.first_name,
o8_.last_name
FROM
contact o0_
LEFT JOIN user o8_ force index for join(`PRIMARY`) ON o0_.user_owner_id = o8_.id
LIMIT
25 OFFSET 100
or adding indexes on fields which appears in select clause (first_name, last_name) on user table:
alter table user add index(first_name, last_name);
Explain result changes to this:
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 253030 | Using index |
| 1 | SIMPLE | o8_ | eq_ref | PRIMARY | PRIMARY | 4 | o0_.user_owner_id | 1 | NULL |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
2 rows in set (0,00 sec)
On mysql 5.5 version I have same explain result without additional indexes:
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 255706 | Using index |
| 1 | SIMPLE | o8_ | eq_ref | PRIMARY | PRIMARY | 4 | o0_.user_owner_id | 1 | |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
2 rows in set (0.00 sec)
Why i need force use PRIMARY index or add extra indexes on mysql 5.6 version?
Same behavior occurs with other selects, when join small tables.
If you have a table with so few rows, it may actually be faster to do a full table scan, than going to an index, locate the records and then go back to the table. If you have other fields in the user table apart from the 3 in the query, then you may consider adding a covering index, but franly, I do not think that any of this would have significant affect on the speed of the query.
I have table users with following columns.
id, name, updated_at
Query with explain plan
mysql> explain select * from users group by users.id order by users.updated_at desc limit 10;
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 190551 | Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
Created a new index
create index test_id_updated_at on users (id, updated_at);
After creating a new index still getting the same result with explain plan.
mysql> explain select * from users group by users.id order by users.updated_at desc limit 10;
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 190551 | Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
1 row in set (0.00 sec)
After forcing a new index in query, still getting same result.
I don't understand why it says 'Using filesort' after creating a new index.
I replicated your cenario, and mysql used the index:
mysql> explain SELECT * FROM test_index GROUP BY id ORDER BY updated_at DESC;
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | test_index | ALL | NULL | NULL | NULL | NULL | 393520 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
1 row in set (0.00 sec)
mysql> CREATE INDEX index1 ON test_index(id, updated_at);
Query OK, 0 rows affected (1.85 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain SELECT * FROM test_index GROUP BY id ORDER BY updated_at DESC;
+----+-------------+------------+-------+---------------+--------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | test_index | index | NULL | index1 | 12 | NULL | 393520 | Using temporary; Using filesort |
+----+-------------+------------+-------+---------------+--------+---------+------+--------+---------------------------------+
1 row in set (0.00 sec)
Maybe your mysql version? Tested this on 5.5.
It's hard to optimize this query (remove the using filesort and temporary) without knowing the full table structure and what you want to retrieve with this (specify the fields instead of "*")