From my previous post I figured out that if I refer multiple columns in a select query I need a compound index, so for my table
CREATE TABLE price (
dt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
marketId INT,
buy DOUBLE,
sell DOUBLE,
PRIMARY KEY (dt, marketId),
FOREIGN KEY fk_price_market(marketId) REFERENCES market(id) ON UPDATE CASCADE ON DELETE CASCADE
) ENGINE=INNODB;
I created the compound index:
CREATE INDEX idx_price_market_buy ON price (marketId, buy, sell, dt);
now the query
select max(dt) from price where marketId=309 and buy>0.3;
executes fast enough within 0.02 sec, but a similar query with the same combination of columns
select max(buy) from price where marketId=309 and dt>'2019-10-29 15:00:00';
takes 0.18 sec that is relatively slow.
descs of these queries look a bit different:
mysql> desc select max(dt) from price where marketId=309 and buy>0.3;
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
| 1 | SIMPLE | price | NULL | range | idx_price_market,idx_price_buy,idx_price_market_buy | idx_price_market_buy | 13 | NULL | 50442 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------------------------------------+----------------------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select max(buy) from price where marketId=309 and dt>'2019-10-29 15:00:00';
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | price | NULL | ref | PRIMARY,idx_price_market,idx_price_market_buy | idx_price_market_buy | 4 | const | 202176 | 50.00 | Using where; Using index |
+----+-------------+-------+------------+------+-----------------------------------------------+----------------------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
for example, key_len differs. What does this mean?
And the main question: what is the difference between buy and dt columns? Why switching them places in the query affects the performance?
Related
Consider the following data in the table of books:
bId serial
1 123
2 234
5 445
9 556
There's another table of missing_books with a latest_known_serial whose values come from the following query:
UPDATE missing_books mb
SET latest_known_serial = (
SELECT serial FROM books b
WHERE b.bId < mb.bId
ORDER BY b.bId DESC LIMIT 1)
The aforementioned query produces the following:
bId latest_known_serial
3 234
4 234
6 445
7 445
8 445
It all works, but I was wondering if there's any more performant way to do this as it actually hits big tables.
You can make performance increase by using indexes to make your query faster: I tried to simulate your query:
mysql> EXPLAIN UPDATE missing_books mb
-> SET latest_known_serial = (
-> SELECT serial FROM books b
-> WHERE b.bId < mb.bId
-> ORDER BY b.bId DESC LIMIT 1);
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------------------+
| 1 | UPDATE | mb | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | b | NULL | ALL | bId | NULL | NULL | NULL | 5 | 33.33 | Range checked for each record (index map: 0x1); Using filesort |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
As you can see in the above query, It uses a full table scan (type: ALL) to perform the operation: Optimizer didn't select to use the indexes (unique) defined on bId column.
Now Let's make it Primary Key instead of unique index, then run the optimizer to see the result set:
Drop Unique index first:
mysql> ALTER TABLE books DROP INDEX bId;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Then Define PK on bId Column
mysql> ALTER TABLE books
ADD PRIMARY KEY (bId);
Now test again:
mysql> EXPLAIN UPDATE missing_books mb SET latest_known_serial = ( SELECT serial FROM books b WHERE b.bId < mb.bId ORDER BY b.bId DESC LIMIT 1);
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------+
| 1 | UPDATE | mb | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
| 2 | DEPENDENT SUBQUERY | b | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 1 | 33.33 | Using where; Backward index scan |
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------+
2 rows in set, 2 warnings (0.00 sec)
As you can see in the key column, optimizer used the PK index defined on books table! You can test the speed by making small adjustments.
First, I create a simple database with one MyISAM table with an indexed field called feature.
CREATE DATABASE test;
USE test;
CREATE TABLE data(
id INT(11) NOT NULL AUTO_INCREMENT,
feature VARCHAR(64),
PRIMARY KEY (id)
) ENGINE=MyISAM;
INSERT INTO data VALUES (1, 'a'), (2, 'b');
CREATE INDEX data_feature ON data(feature);
Then, when I test a GROUP BY query with a count, it doesn't use the index when the COUNT() is made by id (see Extra column at the end of the EXPLAIN).
mysql> EXPLAIN SELECT feature, COUNT(1) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(*) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT feature, COUNT(id) FROM data GROUP BY feature;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| 1 | SIMPLE | data | NULL | index | data_feature | data_feature | 259 | NULL | 2 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
I have tested it on MySQL Community 8.0.21 and MariaDB 10.3.25.
COUNT(id) obligates the Optimizer to check id for being NOT NULL. The standard pattern is to simply say COUNT(*).
InnoDB probably works differently. But this is because the PRIMARY KEY is implicitly tacked onto the end of any secondary index. That makes INDEX(feature) work like INDEX(feature, id), in which case it would be a "covering" index as indicated by "Using index".
(There is virtually no reason to stick with MyISAM in this decade.)
I have the following table with a multi-value index set up on a JSON integer array:
CREATE TABLE test (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
catIds JSON NOT NULL,
PRIMARY KEY (id),
KEY test_categories ((CAST(catIds AS UNSIGNED ARRAY)))
);
I've inserted 200,000 records such as:
INSERT INTO test (catIds) VALUES('[123, 456]');
...
The issue is, querying this table on the catIds field with or without the index does not change the execution speed. I've tried querying with both MEMBER OF() and JSON_CONTAINS(), with and without the index; the speeds are the same.
And indeed, EXPLAIN shows that these queries do not use the index:
mysql> EXPLAIN SELECT count(*) FROM test WHERE 51 MEMBER OF (catIds);
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 201416 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+-------+------+----------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select count(0) AS `count(*)` from `test`.`test` where <cache>(51) member of (`test`.`test`.`catIds`) |
+-------+------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT count(*) FROM test WHERE JSON_CONTAINS(catIds, '51');
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 201416 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS;
+-------+------+---------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select count(0) AS `count(*)` from `test`.`test` where json_contains(`test`.`test`.`catIds`,<cache>('51')) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Why is the index on catIds not used for these queries? What did I miss?
You must use a JSON path for your index definition and the predicate in your query.
https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued says:
The only type of expression that is permitted in a multi-valued key part is a JSON path. The path need not point to an existing element in a JSON document inserted into the indexed column, but must itself be syntactically valid.
I tested this:
mysql> alter table test add key bk1 ((cast(catIds->'$[*]' as unsigned array)));
Query OK, 0 rows affected (0.07 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain SELECT count(*) FROM test WHERE 903 MEMBER OF (catIds->'$[*]');
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | test | NULL | ref | bk1 | bk1 | 9 | const | 8 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
I have no doubt that using this feature will increase the WTFs per minute during code reviews.
Also keep in mind that MySQL will skip using an index if the optimizer thinks it won't help. Like if the table only has a few rows, or if the value you are searching for occurs in a majority of the rows. This is not specific to the multi-valued index, it has been part of MySQL's optimizer behavior with normal indexes for many years.
Here's an example: I have 4096 rows in my table, but they're all the same. Even if I search for a value that occurs in the table, MySQL detects that it would match a majority of rows (all rows, in this case) and avoids the index.
mysql> select distinct catIds from test;
+--------------+
| catIds |
+--------------+
| [258.0, 7.0] |
+--------------+
1 row in set (0.00 sec)
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 4096 |
+----------+
1 row in set (0.01 sec)
mysql> explain SELECT count(*) FROM test WHERE 258 MEMBER OF (catIds);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 4096 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
There are few of reasons why multi-valued indexes in MySQL are so slow:
*) those are always secondary indexes and thus have penalty compared to primary indexes in InnoDB
*) same as regular index it points to the record, which then contains JSON which has to be unpacked in order to be processed.
*) unlike regular indexes, multi-valued indexes can't be covering, they always have to fetch row from the table
All this narrows conditions where multi-valued indexes are beneficial. Best conditions for them are:
*) high selectivity of the index, the higher the better
*) lots of rows in table
*) big json docs - this doesn't make index scan faster, but rather plain scans are slow due to out of row blob storage in innodb and thus allows index scan to be shiner
I have a working, nice, indexed SQL query aggregating notes (sum of ints) for all my users and others stuffs. This is "query A".
I want to use this aggregated notes in others queries, say "query B".
If I create a View based on "query A", will the indexes of the original query will be used when needed if I join it in "query B" ?
Is that true for MySQL ? For others flavors of SQL ?
Thanks.
In MySQL, you cannot create an index on a view. MySQL uses indexes of the underlying tables when you query data against the views that use the merge algorithm. For the views that use the temptable algorithm, indexes are not utilized when you query data against the views.
https://www.percona.com/blog/2007/08/12/mysql-view-as-performance-troublemaker/
Here's a demo table. It has a userid attribute column and a note column.
mysql> create table t (id serial primary key, userid int not null, note int, key(userid,note));
If you do an aggregation to get the sum of note per userid, it does an index-scan on (userid, note).
mysql> explain select userid, sum(note) from t group by userid;
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)
If we create a view for the same query, then we can see that querying the view uses the same index on the underlying table. Views in MySQL are pretty much like macros — they just query the underlying table.
mysql> create view v as select userid, sum(note) from t group by userid;
Query OK, 0 rows affected (0.03 sec)
mysql> explain select * from v;
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
2 rows in set (0.00 sec)
So far so good.
Now let's create a table to join with the view, and join to it.
mysql> create table u (userid int primary key, name text);
Query OK, 0 rows affected (0.09 sec)
mysql> explain select * from v join u using (userid);
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | test.u.userid | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
3 rows in set (0.01 sec)
I tried to use hints like straight_join to force it to read v then join to u.
mysql> explain select * from v straight_join u on (v.userid=u.userid);
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
"Using join buffer (Block Nested Loop)" is MySQL's terminology for "no index used for the join." It's just looping over the table the hard way -- by reading batches of rows from start to finish of the table.
I tried to use force index to tell MySQL that type=ALL is to be avoided.
mysql> explain select * from v straight_join u force index(PRIMARY) on (v.userid=u.userid);
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | v.userid | 1 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
Maybe this is using an index for the join? But it's weird that table u is before table t in the EXPLAIN. I'm frankly not sure how to understand what it's doing, given the order of rows in this EXPLAIN report. I would expect the joined table should come after the primary table of the query.
I only put a few rows of data into each table. One might get some different EXPLAIN results with a larger representative sample of test data. I'll leave that to you to try.
Table has 1 500 000 records, 1 250 000 of them have field = 'z'.
I need select random not 'z' field.
$random = mt_rand(1, 250000);
$query = "SELECT field FROM table WHERE field != 'z' LIMIT $random, 1";
It is working ok.
Then I decided to optimize it and indexed field in table.
Result was strange - it was slower ~3 times. I tested it.
Why it is slower? Is not such indexing should make it faster?
my ISAM
explain with index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table range field field 758 NULL 1139287 Using
explain without index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 1484672 Using where
Summary
The problem is that field is not a good candidate for indexing, due to the nature of b-trees.
Explanation
Let's suppose you have a table that has the results of 500,000 coin tosses, where the toss is either 1 (heads) or 0 (tails):
CREATE TABLE toss (
id int NOT NULL AUTO_INCREMENT,
result int NOT NULL DEFAULT '0',
PRIMARY KEY ( id )
)
select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
| 0 | 250290 |
| 1 | 249710 |
+--------+----------+
2 rows in set (0.40 sec)
If you want to select one toss (at random) where the toss was tails, then you need to search through your table, picking a random starting place.
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.06 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | ALL | NULL | NULL | NULL | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
You see that you're basically searching sequentially through all of the rows to find a match.
If you create an index on the toss field, then your index will contain two values, each with roughly 250,000 entries.
create index foo on toss ( result );
Query OK, 500000 rows affected (2.48 sec)
Records: 500000 Duplicates: 0 Warnings: 0
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.25 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | range | foo | foo | 4 | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
Now you're searching fewer records, but the time to search increased from 0.06 to 0.25 seconds. Why? Because sequentially scanning an index is actually less efficient than sequentially scanning a table, for indexes with a large number of rows for a given key.
Let's look at the indexes on this table:
show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss | 0 | PRIMARY | 1 | id | A | 500000 | NULL | NULL | | BTREE | |
| toss | 1 | foo | 1 | result | A | 2 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
The PRIMARY index is a good index: there are 500,000 rows, and there are 500,000 values. Arranged in a BTREE, you can quickly identify a single row based on the id.
The foo index is a bad index: there are 500,000 rows, but only 2 possible values. This is pretty much the worst possible case for a BTREE -- all of the overhead of searching the index, and still having to search through the results.
In the absence of an order by clause, that LIMIT $random, 1 starts at some undefined place.
And according to your explain, the index isn't even being used.