Is there a difference between the following two statements:
mysql> EXPLAIN SELECT IF(arms IS NULL, 'asdf', arms) FROM limbs;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | limbs | ALL | NULL | NULL | NULL | NULL | 12 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT IF(arms IS NULL, 'asdf', arms) FROM limbs;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | limbs | ALL | NULL | NULL | NULL | NULL | 12 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
Does one perform better or is preferable over the other? Or are they identical?
IFNULL is just a syntactic sugar for IF, which is a syntactic sugar for ANSI SQL CASE.
So use whichever you like better and whichever fits your particular needs.
PS: EXPLAIN doesn't rely on SELECT
Related
I have a table (MySQL 8.0.26, InnoDB) containing an indexed column of MEDIUMINTs that denote the date a record was created:
date_created MEDIUMINT NOT NULL
INDEX idx_created (date_created)
E.g., the entry "210516" denotes 2021-05-16.
Are the following queries roughly equally efficient in utilizing the index?
WHERE 210000<=date_created AND date_created<220000,
WHERE date_created DIV 10000 = 21,
WHERE date_created LIKE '21%', and
WHERE LEFT(date_created, 2) = '21'
I am currently using WHERE date_created DIV 10000 = 21 in my code but wonder if I should alter all queries to make them more efficient.
Thanks a lot in advance.
Look at the type column in EXPLAIN. If it says "ALL" it means it must do a table-scan of all the rows, evaluating the condition expression for each row. This is not using the index.
mysql> explain select * from mytable where 21000<=date_created and date_created < 22000;
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | mytable | NULL | range | date_created | date_created | 4 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+-----------------------+
mysql> explain select * from mytable where date_created like '21%';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | date_created | NULL | NULL | NULL | 8192 | 11.11 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
mysql> explain select * from mytable where date_created div 10000 = 21;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | NULL | NULL | NULL | NULL | 8192 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
mysql> explain select * from mytable where left(date_created, 2) = '21';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ALL | NULL | NULL | NULL | NULL | 8192 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
MySQL 8.0 supports expression indexes, which helps a couple of the cases:
mysql> alter table mytable add index expr1 ((left(date_created, 2)));
mysql> explain select * from mytable where left(date_created, 2) = '21';
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | expr1 | expr1 | 11 | const | 1402 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
mysql> alter table mytable add index expr2 ((date_created DIV 10000));
mysql> explain select * from mytable where date_created div 10000 = 21;
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
| 1 | SIMPLE | mytable | NULL | ref | expr2 | expr2 | 5 | const | 1402 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+-------+---------+-------+------+----------+-------+
But expression indexes won't help the LIKE '21%' search, because you'd have to hard-code the value '21%' in the expression for the index definition. You could use that index to search for that value only, not for the value of a different year.
Background
I made a small table of 10 rows from a previous SELECT already ran (SavedAnimals).
I have a massive table (animals) which I would like to UPDATE using the rows with the same id as each row in my new table.
What I have tried so far
I can quickly SELECT the desired rows from the big table like this:
mysql> EXPLAIN SELECT * FROM animals WHERE ignored=0 and id IN (SELECT animal_id FROM SavedAnimals);
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | animals | eq_ref | PRIMARY | PRIMARY | 8 | db_staging.SavedAnimals.animal_id | 1 | Using where |
| 2 | MATERIALIZED | SavedAnimals | ALL | NULL | NULL | NULL | NULL | 10 | |
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
But the "same" command on the UPDATE is not quick:
mysql> EXPLAIN UPDATE animals SET ignored=1, ignored_when=CURRENT_TIMESTAMP WHERE ignored=0 and id IN (SELECT animal_id FROM SavedAnimals);
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | PRIMARY | animals | index | NULL | PRIMARY | 8 | NULL | 34269464 | Using where |
| 2 | DEPENDENT SUBQUERY | SavedAnimals | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
2 rows in set (0.00 sec)
The UPDATE command never finishes if I run it.
QUESTION
How do I make mariaDB run with the Materialized select_type on the UPDATE like it does on the SELECT?
OR
Is there a totally separate way that I should approach this which would be quick?
Notes
Version: 10.3.23-MariaDB-log
Use JOIN rather than WHERE...IN. MySQL tends to optimize them better.
UPDATE animals AS a
JOIN SavedAnimals AS sa ON a.id = sa.animal_id
SET a.ignored=1, a.ignored_when=CURRENT_TIMESTAMP
WHERE a.ignored = 0
You should find an EXISTS clause more efficient than an IN clause. For example:
UPDATE animals a
SET a.ignored = 1,
a.ignored_when = CURRENT_TIMESTAMP
WHERE a.ignored = 0
AND EXISTS (SELECT * FROM SavedAnimals sa WHERE sa.animal_id = a.id)
I have a working, nice, indexed SQL query aggregating notes (sum of ints) for all my users and others stuffs. This is "query A".
I want to use this aggregated notes in others queries, say "query B".
If I create a View based on "query A", will the indexes of the original query will be used when needed if I join it in "query B" ?
Is that true for MySQL ? For others flavors of SQL ?
Thanks.
In MySQL, you cannot create an index on a view. MySQL uses indexes of the underlying tables when you query data against the views that use the merge algorithm. For the views that use the temptable algorithm, indexes are not utilized when you query data against the views.
https://www.percona.com/blog/2007/08/12/mysql-view-as-performance-troublemaker/
Here's a demo table. It has a userid attribute column and a note column.
mysql> create table t (id serial primary key, userid int not null, note int, key(userid,note));
If you do an aggregation to get the sum of note per userid, it does an index-scan on (userid, note).
mysql> explain select userid, sum(note) from t group by userid;
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)
If we create a view for the same query, then we can see that querying the view uses the same index on the underlying table. Views in MySQL are pretty much like macros — they just query the underlying table.
mysql> create view v as select userid, sum(note) from t group by userid;
Query OK, 0 rows affected (0.03 sec)
mysql> explain select * from v;
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
2 rows in set (0.00 sec)
So far so good.
Now let's create a table to join with the view, and join to it.
mysql> create table u (userid int primary key, name text);
Query OK, 0 rows affected (0.09 sec)
mysql> explain select * from v join u using (userid);
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | test.u.userid | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
3 rows in set (0.01 sec)
I tried to use hints like straight_join to force it to read v then join to u.
mysql> explain select * from v straight_join u on (v.userid=u.userid);
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
"Using join buffer (Block Nested Loop)" is MySQL's terminology for "no index used for the join." It's just looping over the table the hard way -- by reading batches of rows from start to finish of the table.
I tried to use force index to tell MySQL that type=ALL is to be avoided.
mysql> explain select * from v straight_join u force index(PRIMARY) on (v.userid=u.userid);
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | v.userid | 1 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
Maybe this is using an index for the join? But it's weird that table u is before table t in the EXPLAIN. I'm frankly not sure how to understand what it's doing, given the order of rows in this EXPLAIN report. I would expect the joined table should come after the primary table of the query.
I only put a few rows of data into each table. One might get some different EXPLAIN results with a larger representative sample of test data. I'll leave that to you to try.
I am trying to optimize a query on a mysql table I've created. I expect that there will be many many rows in the table. Looking at this question the accepted answer and the top voted answer suggests two different approaches.
I wrote these two queries and want to know which one is more performant.
SELECT uv.*
FROM UserVisit uv INNER JOIN
(SELECT ID,MAX(visitDate) visitDate
FROM UserVisit GROUP BY ID) last
ON (uv.ID = last.ID AND uv.visitDate = last.visitDate);
Running this with EXPLAIN yields:
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | |
| 1 | PRIMARY | uv | eq_ref | PRIMARY | PRIMARY | 11 | last.playscanID,last.visitDate | 1 | |
| 2 | DERIVED | UserVisit | index | NULL | PRIMARY | 11 | NULL | 4 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
3 rows in set (0.01 sec)
And the other query:
SELECT lastVisits.*
FROM ( SELECT * FROM UserVisit ORDER BY visitDate DESC ) lastVisits
GROUP BY lastVisits.ID
Running that with EXPLAIN yields:
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 2 | DERIVED | UserVisit | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)
I am uncertain how to interpret the result of the two EXPLAINs.
Which of these queries can I expect to be faster and why?
EDIT:
This is the way UserVisit table looks:
+----------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| ID | bigint(20) unsigned | NO | PRI | NULL | |
| visitDate | date | NO | PRI | NULL | |
| visitTime | time | NO | | NULL | |
| analysisResult | decimal(3,2) | NO | | NULL | |
+----------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Firstly, you might want to read the manual on EXPLAIN. It's a dense read, but it should provide most of the information you want.
Secondly, as Strawberry says, the second query works by accident. The behaviour may change in future versions, and your query would not return an error, just different data. That's nearly always a bad thing.
Finally, the EXPLAIN suggests that version 1 will be faster. In EXTRA, it's saying it's using an index, which is much faster than filesort. Without a schema, it's hard to be sure, but I think you will also benefit from a compound key on ID and visitdate.
Can someone explain to me why I'm seeing the following behavior:
mysql> show index from history_historyentry;
+----------------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| history_historyentry | 0 | PRIMARY | 1 | id | A | 48609 | NULL | NULL | | BTREE | |
+----------------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
1 row in set (0.00 sec)
mysql> explain SELECT COUNT(*) FROM `history_historyentry` WHERE `history_historyentry`.`is_deleted` = False;
+----+-------------+----------------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | history_historyentry | ALL | NULL | NULL | NULL | NULL | 48612 | Using where |
+----+-------------+----------------------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT COUNT(*) FROM `history_historyentry` WHERE `history_historyentry`.`is_deleted` = True;
+----+-------------+----------------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | history_historyentry | ALL | NULL | NULL | NULL | NULL | 48613 | Using where |
+----+-------------+----------------------+------+---------------+------+---------+------+-------+-------------+
1 row in set (0.00 sec)
mysql> create index deleted on history_historyentry (is_deleted) ;
Query OK, 48627 rows affected (0.38 sec)
Records: 48627 Duplicates: 0 Warnings: 0
mysql> explain SELECT COUNT(*) FROM `history_historyentry` WHERE `history_historyentry`.`is_deleted` = False;
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+
| 1 | SIMPLE | history_historyentry | index | deleted | deleted | 1 | NULL | 36471 | Using where; Using index |
+----+-------------+----------------------+-------+---------------+---------+---------+------+-------+--------------------------+
1 row in set (0.00 sec)
mysql> explain SELECT COUNT(*) FROM `history_historyentry` WHERE `history_historyentry`.`is_deleted` = True;
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | history_historyentry | ref | deleted | deleted | 1 | const | 166 | Using index |
+----+-------------+----------------------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)
Why the descrepancy in index usage for True vs False? Specifically, in the false case, the ref column is NULL, and the extra column is Using where; Using index. But in the true case, the ref column is const, and the extra column is Using index.
Presumably because one gives good selectivity and the other does not, i.e. only a small percentage of the rows are deleted.
A cost based optimiser will only use an index if it provides good selectivity (typically 10%) or possibly if it is a covering index (one which satifies a query without a further table or bookmark lookup).