Confused about why index is not being used [duplicate] - mysql

When running this EXPLAIN query without an index
EXPLAIN SELECT exec_date,
100 * SUM(CASE WHEN cached = 'no' THEN 1 ELSE 0 END) / SUM(1) cached_no,
100 * SUM(CASE WHEN cached != 'no' THEN 1 ELSE 0 END) / SUM(1) cached_yes
FROM requests
GROUP BY exec_date
This is the output
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE requests ALL NULL NULL NULL NULL 478619 Using temporary; Using filesort
If I create an index
ALTER TABLE requests ADD INDEX exec_date(exec_date);
The output is
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE requests index NULL exec_date 4 NULL 497847
Since the value of Extra is blank, does that mean the key exec_date is not being used?
On a test server, the execution time of the actual (not the EXPLAIN statement) query with and without the index is the same.

Using index doesn't mean what you think it means. If it is present in the Extra column, it indicates that the optimizer isn't actually reading the entire rows, it is using the index (exclusively) to find column information.
The key could still be in use for other things, for example to perform lookups if you have a WHERE clause etc. In your specific scenario, for example, the disappearance of the Using temporary; actually does mean that your index is being utilized, since MySQL no longer needs to rearrange the contents of your table into a new temporary table to perform the GROUP BY.

Related

Very slow query when using `id in (max(id))` in subquery

We recently moved our database from MariaDB to AWS Amazon Aurora RDS (MySQL). We observed something strange in a set of queries. We have two queries that are very quick, but when together as nested subquery it takes ages to finish.
Here id is the primary key of the table
SELECT * FROM users where id in(SELECT max(id) FROM users where id = 1);
execution time is ~350ms
SELECT * FROM users where id in(SELECT id FROM users where id = 1);
execution time is ~130ms
SELECT max(id) FROM users where id = 1;
execution time is ~130ms
SELECT id FROM users where id = 1;
execution time is ~130ms
We believe it has to do something with the type of value returned by max that is causing the indexing to be ignored when running the outer query from results of the sub query.
All the above queries are simplified for illustration of the problem. The original queries have more clauses as well as 100s of millions of rows. The issue did not exist prior to the migration and worked fine in MariaDB.
--- RESULTS FROM MariaDB ---
MySQL seems to optimize less efficient compared to MariaDB (int this case).
When doing this in MySQL (see: DBFIDDLE1), the execution plans look like:
For the query without MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
For the query with MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY integers null index null
PRIMARY
4 null
1000
100.00 Using where; Using index
2 DEPENDENT SUBQUERY null null null null
null
null null
null
null Select tables optimized away
While MariaDB (see: DBFIDDLE2 does have a better looking plan when using MAX:
id select_type table type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY system null
null
null null
1
100.00
1 PRIMARY integers const PRIMARY
PRIMARY
4 const
1
100.00 Using index
2 MATERIALIZED null null null
null
null null
null
null Select tables optimized away
EDIT: Because of time (some lack of it 😉) I now add some info
A suggestion to fix this:
SELECT *
FROM integers
WHERE i IN (select * from (SELECT MAX(i) FROM integers WHERE i=1)x);
When looking at the EXECUTION PLAN from MariaDB, which has 1 extra step, I tried to do the same in MySQL. Above query has an even bigger execution plan, but tests show that it performs better. (for explain plans, see: DBFIDDLE1a)
"the question is Mariadb that much faster? it uses a step more that mysql"
One step more does not mean that things get slower.
MySQL takes about 2-3 seconds on the query using the MAX, and MariaDB does execute the same in under 10 msecs. But this is performance, and time may vary on different systems.
SELECT max(id) FROM users where id = 1
Is strange. Since it is looking only at rows where id = 1, then "max" is obviously "1". So is the min. And the average.\
Perhaps you wanted:
SELECT max(id) FROM users
Is there an index on id? Perhaps the PRIMARY KEY? If not, then that might explain the sluggishness.
This can be done much faster (against assuming an index):
SELECT * FROM users
ORDER BY id DESC
LIMIT 1
Does that give you what you want?
To discuss this further, please provide SHOW CREATE TABLE users

MYSQL Array Variable (No store prodecure, No temporarily table)

As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.

Two Left Joins, only one is using an index

I'll show my query first, then the tables. I've tried forcing the index, but it just won't. When that part of the query is ran as a new query on its own, it uses the index and is fast, but since it won't in my full query, it's incredibly slow / never completes.
SELECT p.*, INET_NTOA(p.ip) AS ipStr, qs.score, db.*
FROM proxies p
LEFT JOIN ipdb db FORCE INDEX FOR JOIN (`iprange`)
ON db.ipStart <= p.ip AND db.ipEnd >= p.ip
LEFT JOIN ipqs qs ON qs.ip = p.ip
WHERE expiration_date < '2021-09-18'
ORDER BY expiration_date
LIMIT 500
'iprange' is an index on ipStart + ipEnd.
There are indexes on p.ip and expiration_date
Explain results:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
p
range
expirationdate
expirationdate
4
NULL
2547
Using index condition
1
SIMPLE
db
ALL
iprange
NULL
NULL
NULL
8334413
Range checked for each record (index map: 0x2)
1
SIMPLE
qs
eq_ref
PRIMARY
PRIMARY
4
adscend_Aff.p.ip
1
NULL
The query of ipdb, ran by itself, sometimes uses the index and sometimes doesn't.... When it doesn't it takes 17 seconds, when it does it takes 0.4 seconds.
explain SELECT * FROM ipdb db WHERE db.ipStart <= 785476891 AND db.ipEnd >= 785476891;
explain SELECT * from ipdb db where db.ipStart <= 16941057 AND db.ipEnd >= 16941057;
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
db
ALL
iprange
NULL
NULL
NULL
8334413
Using where
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
db
range
iprange
iprange
4
NULL
86
Using index condition
When I force the index:
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
db
range
iprange
iprange
4
NULL
1818402
Using index condition
and takes 1.8 seconds.
Tried FORCE INDEX instead of FORCE INDEX FOR JOIN in the larger query, but no difference. Not sure how to address this. Tried splitting this into two steps and doing the second step within a php loop but it's still crazy slow that way
If startIp is at the wrong end of the table, forcing that index will force it to go through most of the table. You can't win.
Start over on designing the table and the queries. Here is a technique that runs O(1) instead of O(N): http://mysql.rjweb.org/doc.php/ipranges

Limited SQL query returns all rows instead of one

I tried the SQL code:
explain SELECT * FROM myTable LIMIT 1
As a result I got:
id select_type table type possible_keys key key_len ref **rows**
1 SIMPLE myTable ALL NULL NULL NULL NULL **32117**
Do you know why the query would run though all rows instead of simply picking the first row?
What can I change within the query (or in my table) to reduce the line amount for a similar result?
The rows count shown is only an estimate of the number of rows to examine. It is not always equal to the actual number of rows examined when you run the query.
In particular:
LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number.
Source
When the query actually runs only one row will be examined.
Edited for use of subselect:
Assuming the primary key is "my_id" , use WHERE. For instance:
select * from mytable
where my_id = (
select max(my_id) from mytable
)
While this seems less efficient at first, the result is as such in explain, resulting in just one row returned and a read on the index to find max. I do not suggest doing this against partitioned tables in MySQL:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mytable const PRIMARY PRIMARY 4 const 1
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away

No index on !=?

Consider the following two EXPLAINs:
EXPLAIN SELECT * FROM sales WHERE title != 'The'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales ALL title NULL NULL NULL 41707 Using where
And -
EXPLAIN SELECT * FROM sales WHERE title = 'The'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales ref title title 767 const 1 Using where
Why does the != query have a NULL key? Why doesn't it use title? What causes a = statement to be able to utilize an index but not a !=?
There is no point on using the index unless title is exactly 'The' very frequently.
Since almost every row needs to be selected you don't gain anything from using an index. It can actually be costly to use an index, which is probably what your MySQL engine is determining, so it is opting not to use the index.
Compare the amount of work done in these two situations:
Using the index:
1) Read the entire index tree into memory.
2) Search the index tree for the value 'The' and filter out those entries.
3) Read every row except for the few exceptions (which probably are in the same blocks on the disk as rows that do need to be read, so really the whole table is likely to be read in) from the table into memory.
Without the index:
1) Read every row into memory and while reading them filter out any where title = 'The' from the result set