MySQL Merge Index Optimization not working - mysql

I was try simulate merge index on MySQL like say here http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html but i have no ideea why i dont take column type = 'index merge'.
Here i have contruction table:
create table hotel(
index1 int not null,
nume varchar(100),
index2 int
);
CREATE UNIQUE INDEX hotel_index1 ON hotel (index1);
CREATE UNIQUE INDEX hotel_index2 ON hotel (index2);
insert into hotel(index1,nume,index2) values (1,'primu',1),
(2,'al2lea',2),(5,'al3lea',4),(4,'al4lea',5);
and i do select like say on site :
explain extended select * from hotel where index1 = 5 or index2 = 4;
and the result row from explain is:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE hotel const hotel_index1,hotel_index2 (null) (null) (null) 4 100 Using where
What i do wrong with index and they do not merge like in theroy ?

SQL optimizers are really bad at or optimizations. An alternative is to use union or union all. The exact equivalent is:
select h.*
from hotel h
where h.index1 = 5
union
select h.*
from hotel h
where index2 = 4;
This should use the indexes correctly (assuming the tables are large enough to take advantage of an index).
Note: this uses union. If you don't need the elimination of duplicates, then use union all.
EDIT:
The documentation has this insightful comment:
The choice between different possible variants of the Index Merge
access method and other access methods is based on cost estimates of
various available options.
Apparently, the cost estimates don't lead the optimizer to the optimal choice.

The answar for that i don't see merge index optimization is because this apply if i use limit because then MySQL will divide or group and will try find best index and do search , and in the end he union all result.
explain extended select * from hotel where index1 = 5 OR index2 = 4 limit 3;
I get good answer:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE hotel index_merge hotel_index1,hotel_index2 hotel_index1,hotel_index2 4,5 (null) 2 100 Using union(hotel_index1,hotel_index2); Using where

Related

Very slow query when using `id in (max(id))` in subquery

We recently moved our database from MariaDB to AWS Amazon Aurora RDS (MySQL). We observed something strange in a set of queries. We have two queries that are very quick, but when together as nested subquery it takes ages to finish.
Here id is the primary key of the table
SELECT * FROM users where id in(SELECT max(id) FROM users where id = 1);
execution time is ~350ms
SELECT * FROM users where id in(SELECT id FROM users where id = 1);
execution time is ~130ms
SELECT max(id) FROM users where id = 1;
execution time is ~130ms
SELECT id FROM users where id = 1;
execution time is ~130ms
We believe it has to do something with the type of value returned by max that is causing the indexing to be ignored when running the outer query from results of the sub query.
All the above queries are simplified for illustration of the problem. The original queries have more clauses as well as 100s of millions of rows. The issue did not exist prior to the migration and worked fine in MariaDB.
--- RESULTS FROM MariaDB ---
MySQL seems to optimize less efficient compared to MariaDB (int this case).
When doing this in MySQL (see: DBFIDDLE1), the execution plans look like:
For the query without MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
1 SIMPLE integers null const
PRIMARY
PRIMARY 4 const
1
100.00 Using index
For the query with MAX:
id select_type table partitions type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY integers null index null
PRIMARY
4 null
1000
100.00 Using where; Using index
2 DEPENDENT SUBQUERY null null null null
null
null null
null
null Select tables optimized away
While MariaDB (see: DBFIDDLE2 does have a better looking plan when using MAX:
id select_type table type
possible_keys
key key_len ref
rows
filtered Extra
1 PRIMARY system null
null
null null
1
100.00
1 PRIMARY integers const PRIMARY
PRIMARY
4 const
1
100.00 Using index
2 MATERIALIZED null null null
null
null null
null
null Select tables optimized away
EDIT: Because of time (some lack of it 😉) I now add some info
A suggestion to fix this:
SELECT *
FROM integers
WHERE i IN (select * from (SELECT MAX(i) FROM integers WHERE i=1)x);
When looking at the EXECUTION PLAN from MariaDB, which has 1 extra step, I tried to do the same in MySQL. Above query has an even bigger execution plan, but tests show that it performs better. (for explain plans, see: DBFIDDLE1a)
"the question is Mariadb that much faster? it uses a step more that mysql"
One step more does not mean that things get slower.
MySQL takes about 2-3 seconds on the query using the MAX, and MariaDB does execute the same in under 10 msecs. But this is performance, and time may vary on different systems.
SELECT max(id) FROM users where id = 1
Is strange. Since it is looking only at rows where id = 1, then "max" is obviously "1". So is the min. And the average.\
Perhaps you wanted:
SELECT max(id) FROM users
Is there an index on id? Perhaps the PRIMARY KEY? If not, then that might explain the sluggishness.
This can be done much faster (against assuming an index):
SELECT * FROM users
ORDER BY id DESC
LIMIT 1
Does that give you what you want?
To discuss this further, please provide SHOW CREATE TABLE users

MYSQL Array Variable (No store prodecure, No temporarily table)

As mention in the title, I would like to know any solution for this by not using store prodecure, temporarily table etc.
Compare Query#1 and Query#3, Query#3 get worst performance result. Does it have any workaround to put in variable but without impact the performance result.
Schema (MySQL v5.7)
create table `order`(
id BIGINT(20) not null auto_increment,
transaction_no varchar(20) not null,
primary key (`id`),
unique key uk_transaction_no (`transaction_no`)
);
insert into `order` (`id`,`transaction_no`)
value (1,'10001'),(2,'10002'),(3,'10003'),(4,'10004');
Query #1
explain select * from `order` where transaction_no in ('10001','10004');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
range
uk_transaction_no
uk_transaction_no
22
2
100
Using where; Using index
Query #2
set #transactionNos = "10001,10004";
There are no results to be displayed.
Query #3
explain select * from `order` where find_in_set(transaction_no, #transactionNos);
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
order
index
uk_transaction_no
22
4
100
Using where; Using index
Short Answer: See Sargeability
Long Answer:
MySQL makes no attempt to optimize an expression when an indexed column when it is hidden inside a function call such as FIND_IN_SET(), DATE(), etc. Your Query 3 will always be performed as a full table scan or a full index scan.
So the only way to optimize what you are doing is to construct
IN ('10001','10004')
That is often difficult to achieve when "binding" a list of values. IN(?) will not work in most APIs to MySQL.

MYSQL improve query performance when using OR

I have a very simple MYSQL table with 2 columns and I run this query:
SELECT * FROM table WHERE (col1 = '123' AND col2 = '456')
OR (col1 = '456' AND col2 = '123')
Col1 and col2 are a composite primary key: PRIMARY KEY('col1','col2'). Each of both is also a foreign key for primary key in another table
When I ran the EXPLAIN command for the above query i got the following:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table index PRIMARY,col2 col2 8 NULL 1 Using where; Using index
The type in the above result is index which is very similar to All and so very likely to be slow on a large database. Is there a way to improve the above select command
In reality statements such as likely to be slow on a large database should be a red flag.
If you're going to have a large dataset, profiling and testing is vital to determine firstly if it will be a problem and then if it will be enough of a problem to warrant development time and cost to address. Usually this means micro optimisations that are unlikely to have any impact on most code bases.
Anyway, lets answer the question.
Yes, hypothetically as it's us using an index file, and if you have huge amounts of data and query this table a lot potentially it can be optimised by splitting the query into multiple execution sets rather than using expressions operators within the query, if you are only going to query twice as in your example you could achieve more performance with a union such as:
(
SELECT * FROM test
WHERE
(
col1 = 123 AND col2 = 456
)
)
UNION
(
SELECT * FROM test
WHERE
(
col1 = 456 AND col2 = 123
)
)
An EXPLAIN for this query is as follows:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA
1 PRIMARY test ref PRIMARY PRIMARY 4 const 1 Using where; Using index
2 UNION test ref PRIMARY PRIMARY 4 const 2 Using where; Using index
(null) UNION RESULT <union1,2> ALL (null) (null) (null) (null) (null)
Take a look at this SQL fiddle http://sqlfiddle.com/#!2/9dc07a/1/0 for a simple test case.
The language I've used in this post such as "might", "could" etc is because I've not front loaded this example with hundreds of millions of records - I would strongly suggest you do this and evaluate and profile your query in more detail.
Unfortunately with optimisation, there isn't always a clear and simple answer of doing x to get greater performance - the query optimiser is a complex beast and sometimes trying to get every drop of performance can actually cripple your application (I'm speaking from experience here) so please, unless you have to worry about these micro optimisations - don't, if you do then evaluate, profile and test it fully before deciding on an approach.
The cost based optimiser chooses an execution plan based on the statistics it has about the table. It knows there are not many rows in there, and so it's a waste of time doing something clever. Increase the number of rows in the table dramatically and ensure you have a high cardinality (lots of different values) and run the explain plan again and you'll see the execution change.

Mysql converting subquery into dependent subquery

Hi I am not understanding , why the subquery of given query is converting into dependent subquery.
Although the subquery is not dependent(not using primary query table) on main query.
I know that this query can be optimized using joins,but here i just want to know the reason of this
MYSQL Version 5.5
EXPLAIN SELECT id FROM `cab_request_histories`
WHERE cab_request_histories.id = any(SELECT id
FROM cab_requests
WHERE cab_requests.request_type = 'pickup')
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY cab_request_histories index NULL PRIMARY 4 NULL 20
2 DEPENDENT SUBQUERY cab_requests unique_subquery PRIMARY PRIMARY 4 func 1
I suspect that the ANY keyword will require MySQL to pass the values from outside the subquery to inside it to evaluate whether the result is true.
Mysql optimizer uses EXIST strategy for this query, effectively changing it to something like:
SELECT id FROM cab_request_histories
WHERE EXISTS
( SELECT 'this one is dependent' FROM cab_requests
WHERE cab_requests.request_type = 'pickup'
AND cab_requests.id = cab_request_histories.id )
You can see what optimizer does with your query using EXPLAIN EXTENDED your_query followed by SHOW WARNINGS.
This type of optimization is described in http://dev.mysql.com/doc/refman/5.5/en/subquery-optimization-with-exists.html.

mysql index issues

My table has 1,000,000 rows and 4 columns:
id cont stat message
1 rgrf 0 ttgthyhtg
2 frrgt 0 tthyrt
3 4r44 1 rrttttg
...
I am performing a select query which is very slow even though I have done indexing
SELECT * FROM tablea WHERE stat='0' order by id LIMIT 1
This query is making my mysql very slow, I checked with mysql explain and found this
explain SELECT * FROM tablea WHERE stat='0' order by id LIMIT 1
and I was shocked by the output but I don't know how to optimize it.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tablea ref stat stat 4 const 216404 Using where
There are 216,404 rows for optimizing that I have to reduce to 1 or 2 but how?
The problem is that MySQL can only use one index per table in a query, this is index stat in your case. So, the ORDER BY is performed without use of index, which is very slow on 1M rows.
Try the following:
Implicitly use the correct index:
SELECT * FROM tablea USE INDEX(PRIMARY) WHERE stat='0' order by id LIMIT 1
Create a composite index, just as Ollie Jones said above.
I suggest you try creating a compound index on (stat, id). This may allow your search / order operation to be optimized. There's a downside, of course: you'll incur extra overhead with insertions and updates.
CREATE INDEX ON tablea (stat,id) USING BTREE
Give it a try.