Mysql converting subquery into dependent subquery - mysql

Hi I am not understanding , why the subquery of given query is converting into dependent subquery.
Although the subquery is not dependent(not using primary query table) on main query.
I know that this query can be optimized using joins,but here i just want to know the reason of this
MYSQL Version 5.5
EXPLAIN SELECT id FROM `cab_request_histories`
WHERE cab_request_histories.id = any(SELECT id
FROM cab_requests
WHERE cab_requests.request_type = 'pickup')
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY cab_request_histories index NULL PRIMARY 4 NULL 20
2 DEPENDENT SUBQUERY cab_requests unique_subquery PRIMARY PRIMARY 4 func 1

I suspect that the ANY keyword will require MySQL to pass the values from outside the subquery to inside it to evaluate whether the result is true.

Mysql optimizer uses EXIST strategy for this query, effectively changing it to something like:
SELECT id FROM cab_request_histories
WHERE EXISTS
( SELECT 'this one is dependent' FROM cab_requests
WHERE cab_requests.request_type = 'pickup'
AND cab_requests.id = cab_request_histories.id )
You can see what optimizer does with your query using EXPLAIN EXTENDED your_query followed by SHOW WARNINGS.
This type of optimization is described in http://dev.mysql.com/doc/refman/5.5/en/subquery-optimization-with-exists.html.

Related

Very slow query when using `id in (max(id))` in subquery

We recently moved our database from MariaDB to AWS Amazon Aurora RDS (MySQL). We observed something strange in a set of queries. We have two queries that are very quick, but when together as nested subquery it takes ages to finish.
Here id is the primary key of the table
SELECT * FROM users where id in(SELECT max(id) FROM users where id = 1);
execution time is ~350ms
SELECT * FROM users where id in(SELECT id FROM users where id = 1);
execution time is ~130ms
SELECT max(id) FROM users where id = 1;
execution time is ~130ms
SELECT id FROM users where id = 1;
execution time is ~130ms
We believe it has to do something with the type of value returned by max that is causing the indexing to be ignored when running the outer query from results of the sub query.
All the above queries are simplified for illustration of the problem. The original queries have more clauses as well as 100s of millions of rows. The issue did not exist prior to the migration and worked fine in MariaDB.
--- RESULTS FROM MariaDB ---
MySQL seems to optimize less efficient compared to MariaDB (int this case).
When doing this in MySQL (see: DBFIDDLE1), the execution plans look like:
For the query without MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
For the query with MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY integers null index null
PRIMARY
4 null
1000
100.00 Using where; Using index
2 DEPENDENT SUBQUERY null null null null
null
null null
null
null Select tables optimized away
While MariaDB (see: DBFIDDLE2 does have a better looking plan when using MAX:
id select_type table type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY system null
null
null null
1
100.00
1 PRIMARY integers const PRIMARY
PRIMARY
4 const
1
100.00 Using index
2 MATERIALIZED null null null
null
null null
null
null Select tables optimized away
EDIT: Because of time (some lack of it 😉) I now add some info
A suggestion to fix this:
SELECT *
FROM integers
WHERE i IN (select * from (SELECT MAX(i) FROM integers WHERE i=1)x);
When looking at the EXECUTION PLAN from MariaDB, which has 1 extra step, I tried to do the same in MySQL. Above query has an even bigger execution plan, but tests show that it performs better. (for explain plans, see: DBFIDDLE1a)
"the question is Mariadb that much faster? it uses a step more that mysql"
One step more does not mean that things get slower.
MySQL takes about 2-3 seconds on the query using the MAX, and MariaDB does execute the same in under 10 msecs. But this is performance, and time may vary on different systems.
SELECT max(id) FROM users where id = 1
Is strange. Since it is looking only at rows where id = 1, then "max" is obviously "1". So is the min. And the average.\
Perhaps you wanted:
SELECT max(id) FROM users
Is there an index on id? Perhaps the PRIMARY KEY? If not, then that might explain the sluggishness.
This can be done much faster (against assuming an index):
SELECT * FROM users
ORDER BY id DESC
LIMIT 1
Does that give you what you want?
To discuss this further, please provide SHOW CREATE TABLE users

SQL query with subquery takes longer than both queries separately

Problem
I have two queries where one needs the result of the other one. My first guess was to use an independent subquery:
SELECT P2.*
FROM ExampleTable P2
WHERE P2.delivery_start >= (
SELECT MIN(P1.delivery_start)
FROM ExampleTable P1
WHERE 1641288602 < P1.delivery_end
);
The entire query takes 5-6 seconds which is way to long for my application. Running these queries after another takes only around 800ms for both:
SELECT MIN(P1.delivery_start)
FROM ExampleTable P1
WHERE 1641288602 < P1.delivery_end;
SELECT P2.*
FROM ExampleTable P2
WHERE P2.delivery_start >= 1641286800;
I am using Mariadb 10.2 and have indices on both delivery_start and delivery_end.
What I have tried
I have used a CTE instead of a subquery which resulted in the same performance. Using a Variable with SET yields similar results as to running both queries separately, so thats what I will use for the time being.
I ran EXPLAIN on all 3 Queries:
1. Query with subquery
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
PRIMARY
P2
ALL
delivery_start
NULL
NULL
NULL
6388282
Using where
2
SUBQUERY
P1
range
delivery_end
delivery_end
4
NULL
36378
Using index condition
2. Separate Queries
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
P1
range
delivery_end
delivery_end
4
NULL
36432
Using index condition
id
select_type
table
type
possible_keys
key
key_len
ref
rows
Extra
1
SIMPLE
P2
range
delivery_start
delivery_start
4
NULL
35944
Using index condition
Question
I think the issue is shown in the first EXPLAIN table as it has type ALL which means that the database performs a full table scan. My question is simply: why? Is the optimizer not able to figure out that the subquery produces a number with which we only need a range type query? And why does it not use any index?
The problem is described in the MariaDB docs:
In all remaining cases when NULL cannot be substituted with FALSE, it
is not possible to use index lookups. This is not a limitation in the
server, but a consequence of the NULL semantics in the ANSI SQL
standard.
There is a full examination here:
https://mariadb.com/kb/en/non-semi-join-subquery-optimizations/
The result of your subquery can potentially return a NULL in the case no rows were found. Hence, MariaDB cannot use the index for the parent query.
You must rewrite your subquery in a way that it will always return a row with a non-NULL scalar or stick with two separate queries. However, what should happen if your first query returns NULL? With a compound statement you could put an if around the second query and don't even execute it if the first returns NULL.
Replace these
INDEX(delivery_start)
INDEX(delivery_end)
with these:
INDEX(delivery_start, delivery_end)
INDEX(delivery_end, delivery_start)
The second one will help significantly with the subquery. Then the first may help with the outer query.
(If those don't help, please add SHOW CREATE TABLE, EXPLAIN SELECT ... and table sizes.)

MYSQL Array Variable (No store prodecure, No temporarily table)

As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.

DISTINCT causing full table scan

I have a table in MySQL (5.5.31) which has about 20M rows. The following query:
SELECT DISTINCT mytable.name name FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0 ORDER BY mytable.date_modified DESC LIMIT 0,21
is causing full table scan, with explain saying type is ALL and extra info is Using where; Using temporary; Using filesort. Explain results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable ALL NULL NULL NULL NULL 19001156 Using where; Using temporary; Using filesort
1 SIMPLE mytable_c eq_ref PRIMARY PRIMARY 108 mytable.id 1 Using index
Without the join explain looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable index NULL mytablemod 9 NULL 21 Using where; Using temporary
id_c is the primary key for mytable_c and mytable_c does not have more than one row for every row in mytable. date_modified is indexed. But looks like MySQL does not understand that. If I remove the DISTINCT clause, then explain uses index and touches only 21 rows just as expected. If I remove the join it also does this. Is there any way to make it work without the full table scan with the join? explain shows mysql knows it needs only one row from mytable_c and it is using the primary key, but still does full scan on mytable.
The reason DISTINCT is there that the query is generated by the ORM system in which there might be cases where multiple rows may be produced by JOINs, but the values of SELECT fields will always be unique (i.e. if JOIN is against multi-value link only fields that are the same in every joined row will be in SELECT).
These are just generic comments, not mysql specific.
To find all the possible name values from mytable a full scan of either the table or an index needs to happen. Possible options:
full table scan
full index scan of an index starting with deleted (take advantage of the filter)
full index scan of an index starting with name (only column of concern for output)
If there was an index on deleted, the server could find all the deleted = 0 index entries and then look up the corresponding name value from the table. But if deleted has low cardinality or the statistics aren't there to say differently, it could be more expensive to do the double reads of first the index then the corresponding data item. In that case, just scan the table.
If there was an index on name, an index scan could be sufficient, but then the table needs to be checked for the filter. Again frequent hopping from index to table.
The join columns also need to be considered in a similar manner.
If you forget about the join part and had a multi-part index on columns name, deleted then an index scan would probably happen.
Update
To me the DISTINCT and ORDER BY parts are a bit confusing. Of which name record is the date_modified to be used for sorting? I think something like this would be a bit more clear:
SELECT mytable.name name --, MIN(mytable.date_modified)
FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0
GROUP BY mytable.name
ORDER BY MIN(mytable.date_modified) DESC LIMIT 0,21
Either way, once the ORDER BY comes into play, a full scan needs to be done to find the order. Without the ORDER BY, the first 21 found could suffice.
Why do not you try to move condition mytable.deleted = 0 from WHERE to the JOIN ON ? You can also try FORCE INDEX (mytablemod)

Limited SQL query returns all rows instead of one

I tried the SQL code:
explain SELECT * FROM myTable LIMIT 1
As a result I got:
id select_type table type possible_keys key key_len ref **rows**
1 SIMPLE myTable ALL NULL NULL NULL NULL **32117**
Do you know why the query would run though all rows instead of simply picking the first row?
What can I change within the query (or in my table) to reduce the line amount for a similar result?
The rows count shown is only an estimate of the number of rows to examine. It is not always equal to the actual number of rows examined when you run the query.
In particular:
LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number.
Source
When the query actually runs only one row will be examined.
Edited for use of subselect:
Assuming the primary key is "my_id" , use WHERE. For instance:
select * from mytable
where my_id = (
select max(my_id) from mytable
)
While this seems less efficient at first, the result is as such in explain, resulting in just one row returned and a read on the index to find max. I do not suggest doing this against partitioned tables in MySQL:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mytable const PRIMARY PRIMARY 4 const 1
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away