Slow MySql query (optimizing LEFT OUTER JOIN) - mysql

I have Mysql query, something like this:
SELECT
Main.Code,
Nt,
Ss,
Nac,
Price,
Ei,
Quant,
Dateadded,
Sh,
Crit,
CAST(Ss * Quant AS DECIMAL (10 , 2 )) AS Qss,
CAST(Price * Quant AS DECIMAL (10 , 2 )) AS Qprice,
`Extra0`.`Value`
FROM
Main
LEFT OUTER JOIN
`Extra_fields` AS `Extra0` ON `Extra0`.`Code` = `Main`.`Code`
AND `Extra0`.`Nf` = 2
ORDER BY `Code`
The query is very slow (about 10 sec.). The query without this part:
LEFT OUTER JOIN Extra_fields AS Extra0 ON Extra0.Code = Main.Code AND Extra0.Nf=2
is fast.
Is there some way to optimize first query?

You want to add an index on the joined table to help look up values by Code and Nf, then add the Value column so it can satisfy the column you need for the select-list:
ALTER TABLE Extra_fields ADD KEY (Code, Nf, Value);
You may benefit by adding an index on Main.Code so it reads the table in sorted order without having to do a filesort:
ALTER TABLE Main ADD KEY (Code);
I ran EXPLAIN on your query and got this:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Main
partitions: NULL
type: index
possible_keys: NULL
key: Code
key_len: 5
ref: NULL
rows: 1
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: Extra0
partitions: NULL
type: ref
possible_keys: code
key: code
key_len: 10
ref: test.Main.Code,const
rows: 1
filtered: 100.00
Extra: Using index
The first table has no filesort. I had to use ...FROM Main FORCE INDEX(Code)... but it could be because I tested with no rows in the table.
The second table shows it is using an index-only access method ("Extra: Using index"). I assume only three columns from Extra_fields are referenced, and all other columns are from Main.

Related

Which query should be used? Deducing from MySQL Explain

Explaining MySQL Explain chapter in O'reilly Optimizing SQL Statments Book, has this question at the end.
The following is an example of a business need that retrieves orphaned parent records in a parent/child relationship. This SQL query can be written in three different ways. While the output produces the same results, the QEP shows three different paths.
mysql> EXPLAIN SELECT p.*
-> FROM parent p
-> WHERE p.id NOT IN (SELECT c.parent_id FROM child c)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: p
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 160
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: c
type: index_subquery
possible_keys: parent_id
key: parent_id
key_len: 4
ref: func
rows: 1
Extra: Using index
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT p.*
-> FROM parent p
-> LEFT JOIN child c ON p.id = c.parent_id
-> WHERE c.child_id IS NULL\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: p
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 160
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ref
possible_keys: parent_id
key: parent_id
key_len: 4
ref: test.p.id
rows: 1
Extra: Using where; Using index; Not exists
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT p.*
-> FROM parent p
-> WHERE NOT EXISTS
-> SELECT parent_id FROM child c WHERE c.parent_id = p.id)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: p
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 160
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: c
type: ref
possible_keys: parent_id
key: parent_id
key_len: 4
ref: test.p.id
rows: 1
Extra: Using index
2 rows in set (0.00 sec)
Which is best? Will data growth over time cause a different QEP to perform better?
There is no answer in the book or internet as far as I could research.
There is an old article from 2009 which I've seen linked on stackoverflow many times. The test there shows, that the NOT EXISTS query is 27% (it's actually 26%) slower than the other two queries (LEFT JOIN and NOT IN).
However, the optimizer has been improved from version to version. And the perfect optimizer would create the same execution plan for all three queries. But as long as the optimizer is not perfect, the answer on "Which query is faster?" can depend on actual setup (which includes version, settings and data).
I've run similar tests in the past, and all I remember is that the LEFT JOIN has never been significantly slower than any other method. But out of curiosity I've just created a new test on MariaDB 10.3.13 portable Windows version with default settings.
Dummy data:
set #parents = 1000;
drop table if exists parent;
create table parent(
parent_id mediumint unsigned primary key
);
insert into parent(parent_id)
select seq
from seq_1_to_1000000
where seq <= #parents
;
drop table if exists child;
create table child(
child_id mediumint unsigned primary key,
parent_id mediumint unsigned not null,
index (parent_id)
);
insert into child(child_id, parent_id)
select seq as child_id
, floor(rand(1)*#parents)+1 as parent_id
from seq_1_to_1000000
;
NOT IN:
set #start = TIME(SYSDATE(6));
select count(*) into #cnt
from parent p
where p.parent_id not in (select parent_id from child c);
select #cnt, TIMEDIFF(TIME(SYSDATE(6)), #start);
LEFT JOIN:
set #start = TIME(SYSDATE(6));
select count(*) into #cnt
from parent p
left join child c on c.parent_id = p.parent_id
where c.parent_id is null;
select #cnt, TIMEDIFF(TIME(SYSDATE(6)), #start);
NOT EXISTS:
set #start = TIME(SYSDATE(6));
select count(*) into #cnt
from parent p
where not exists (
select *
from child c
where c.parent_id = p.parent_id
);
select #cnt, TIMEDIFF(TIME(SYSDATE(6)), #start);
Execution time in milliseconds:
#parents | 1000 | 10000 | 100000 | 1000000
-----------|------|-------|--------|--------
NOT IN | 21 | 38 | 175 | 4459
LEFT JOIN | 24 | 40 | 183 | 1508
NOT EXISTS | 26 | 44 | 180 | 4463
I've executed the queries multiple times and took the least time value. And SYSDATE is probably not the best method to measure execution time - So don't take these numbers as accurate. However, we can see that up to 100K parent rows, there is not much difference, and the NOT IN method is a bit faster. But with 1M parent rows the LEFT JOIN is three times faster.
Conclusion
So what is the answer? I could just say: "LEFT JOIN" wins. But the truth is - This test proves nothing. And the answer is (as many times): "It depends". When performance matters, best you can do, is to run your own tests with real queries against real data. If you don't have real data (yet), you should create dummy data with the amount and distribution you expect to have in the future.
It depends on what version of MySQL you are using. In older versions, IN ( SELECT ...) performed terribly. In the latest version, it is often as good as the other variants. Also, MariaDB has some optimization differences, probably in this area.
EXISTS( SELECT 1 ... ) is perhaps the clearest in stating the intent. And it perhaps has always (once it came into existence) been fast.
NOT IN and NOT EXISTS are a different animal.
Some things in your Question that may have impact: func and index_subquery. In similar queries, you may not see these, and that difference may lead to performance differences.
Or, to repeat myself:
"There have been a number of improvements in the Optimizer since 2009.
"To the Author (Quassnoi): Please rerun your tests, and specify which version they are being run against. Note also that MySQL and MariaDB may yield different results.
"To the Reader: Test the variants yourself, do not blindly trust the conclusions in this blog."

MySQL query plan not using Index

I have a nested query and I trying to see if there is any full table scan in my query.
explain delete from ACCESS where ACCESS.MESSAGEID in (select ID from MESSAGE where MESSAGE.CID = 'xzy67sd’)\G
The sub query is hitting index but the first is not using index. Here is the query plan.
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: ACCESS
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 18295
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: MESSAGE
type: unique_subquery
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: func
rows: 1
Extra: Using where
But if I separate the query and check the query plan then it is using index. I am not able to understand why and looking for some hints
explain delete from ACCESS where ACCESS.MESSAGEID in (2,3)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ACCESS
type: range
possible_keys: ACCESS_ID1
key: ACCESS_ID1
key_len: 8
ref: const
rows: 2
Extra: Using where
Subquery select statement returns constant, so rather than using select statement I type integer and the query plan start picking index
select ID from MESSAGE where MESSAGE.CID = 'xzy67sd’)\G
Thanks in advance
You don't need a subquery, here, and as a general rule, you shouldn't use one in MySQL unless you actually do need one.
DELETE a
FROM ACCESS a
JOIN MESSAGE m ON m.ID = a.MESSAGEID
WHERE m.CID = 'xzy67sd’;
This will delete the rows from ACCESS while leaving MESSAGE alone because only ACCESS is listed (by its alias "a") between DELETE and FROM, which is where you specify which tables you want to delete matching rows from.
The optimizer should use the indexes appropriately.
https://dev.mysql.com/doc/refman/5.6/en/delete.html (multi-table syntax)

MySQL query stuck at "Sorting Result" for a single row result set

I am building a star schema to act as the backend for an analytics app I am building. My query generator is building queries using a regular star-join pattern. A sample query is below, whereby a fact table is joined to two dimension tables and the dimension tables are filtered by constant values chosen by the end user.
I am using MySQL 5.5 and all tables are MyISAM.
In this problem, I am simply trying to pull the first N rows (in this case, the first 1 row)
EXPLAIN
SELECT fact_table.*
FROM
fact_table
INNER JOIN
dim1 ON (fact_table.dim1_key = dim1.pkey)
INNER JOIN
dim2 ON (fact_table.dim2_key = dim2.pkey)
WHERE
dim1.constant_value = 123
AND dim2.constant_value = 456
ORDER BY
measure1 ASC LIMIT 1
The explain output follows. Both the dimension keys resolve to constant values since there is a unique key applied to their value.
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: dim1
type: const
possible_keys: PRIMARY,dim1_uk
key: dim1_uk
key_len: 8
ref: const
rows: 1
Extra: Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: dim2
type: const
possible_keys: PRIMARY,dim2_uk
key: dim2_uk
key_len: 8
ref: const
rows: 1
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: fact_table
type: ref
possible_keys: my_idx
key: my_idx
key_len: 16
ref: const,const
rows: 50010
Extra: Using where
And here is the index on the fact table:
show indexes from fact_table
*************************** 10. row ***************************
Table: fact_table
Non_unique: 1
Key_name: my_idx
Seq_in_index: 1
Column_name: dim1_key
Collation: A
Cardinality: 24
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 11. row ***************************
Table: fact_table
Non_unique: 1
Key_name: my_idx
Seq_in_index: 2
Column_name: dim2_key
Collation: A
Cardinality: 70
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 12. row ***************************
Table: fact_table
Non_unique: 1
Key_name: my_idx
Seq_in_index: 3
Column_name: measure1
Collation: A
Cardinality: 5643
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
When profiling this query, I see the query spends the majority of its time performing a filesort operation "sorting result". My question is, even when using the correct index, why can't this query simply pull out the first value without doing a sort? The my_idx is already sorted on the right column and the two columns appearing first in the index resolve as constants, as shown in the plan.
If I rewrite the query, as follows, I am able to get the plan I want, with no file sorting.
SELECT fact_table.*
FROM
fact_table
WHERE
dim1_key = (select pkey from dim1 where constant_value = 123)
AND dim2_key = (select pkey from dim2 where constant_value = 456)
ORDER BY
measure1 ASC LIMIT 1
It would be expensive to change the tool generating these SQL commands so I would like to avoid this filesort even when the query is written in the original format.
My question is, why is MySQL opting to do a filesort even when the first keys on the index are constants (via an INNER JOIN) and the index is sorted in the right order? Is there a way around this?
My question is, why is MySQL opting to do a filesort even when the first keys on the index are constants (via an INNER JOIN) and the index is sorted in the right order? Is there a way around this?
Because the order of the resultset depends on the index used for reading the first table in the JOIN, but, as you see in EXPLAIN, the JOIN actually starts from dim1 table.
It might seem strange, but to implicitly force MySQL start from fact_table you will need to change the indexes in the dimension tables to (pkey, constantvalue) instead of (constantvalue), otherwise MySQL optimizer will start with a table for which the condition constantvalue=some_value returns minimum rows. The problem is that you might need those indexes for other queries.
Instead, you may try to add STRAIGHT_JOIN option to the SELECT and explicitly force the order.

Index of three columns in mySQL

I have 3 columns a,b and c and i have indexed them as (a,b,c). i have a query like this :
SELECT * FROM tablename WHERE a=something and c=someone
My question is Does this query use this index or not!?
It may use the first column (a) of the index, but it can't use the third column (c).
One way you can tell is that the output of EXPLAIN.
Here's an example:
mysql> create table tablename (a int, b int, c int, key (a,b,c));
...I filled it with some random data...
mysql> explain SELECT * FROM tablename WHERE a=125 and c=456\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tablename
type: ref
possible_keys: a
key: a
key_len: 5
ref: const
rows: 20
Extra: Using where; Using index
The above shows ref: const which shows only one of the constant values are used to find rows in the index. Also the key_len: 5 shows only a subset of the index is used, since an index entry with three integers should be larger than 5 bytes.
mysql> explain SELECT * FROM tablename WHERE a=125 and b = 789 and c=456\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tablename
type: ref
possible_keys: a
key: a
key_len: 15
ref: const,const,const
rows: 1
Extra: Using index
When we use conditions on all three columns, it shows ref: const,const,const showing that all three values are being used to look up index entries. And the key_len is large enough to be an entry of three integers.
As Mihal says, if you prefix the query with EXPLAIN, the optimizer will tell you if it uses the index or not. Bill is partially correct in that it will only look up the value for a in the index, but if the table only contains the columns a,b and c, then the index is covering and the values for b and c will be retrieved from the index without reference to the table data - but the DBMS will still scan through all values of b and c in the index - not just going directly to the specified value for c.
It may be possible to fudge a query to make it use an index to a greater depth - assuming that b is an integer....
SELECT *
FROM tablename
WHERE a='something'
AND b BETWEEN -8388608 AND 8388607
AND c='someone'

Why isn't MySQL using any of these possible keys?

I have the following query:
SELECT t.id
FROM account_transaction t
JOIN transaction_code tc ON t.transaction_code_id = tc.id
JOIN account a ON t.account_number = a.account_number
GROUP BY tc.id
When I do an EXPLAIN the first row shows, among other things, this:
table: t
type: ALL
possible_keys: account_id,transaction_code_id,account_transaction_transaction_code_id,account_transaction_account_number
key: NULL
rows: 465663
Why is key NULL?
Another issue you may be encountering is a data type mis-match. For example, if your column is a string data type (CHAR, for ex), and your query is not quoting a number, then MySQL won't use the index.
SELECT * FROM tbl WHERE col = 12345; # No index
SELECT * FROM tbl WHERE col = '12345'; # Index
Source: Just fought this same issue today, and learned the hard way on MySQL 5.1. :)
Edit: Additional information to verify this:
mysql> desc das_table \G
*************************** 1. row ***************************
Field: das_column
Type: varchar(32)
Null: NO
Key: PRI
Default:
Extra:
*************************** 2. row ***************************
[SNIP!]
mysql> explain select * from das_table where das_column = 189017 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: das_column
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 874282
Extra: Using where
1 row in set (0.00 sec)
mysql> explain select * from das_table where das_column = '189017' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: das_column
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 34
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
It might be because the statistics is broken, or because it knows that you always have a 1:1 ratio between the two tables.
You can force an index to be used in the query, and see if that would speed up things. If it does, try to run ANALYZE TABLE to make sure statistics are up to date.
By specifying USE INDEX (index_list), you can tell MySQL to use only one of the named indexes to find rows in the table. The alternative syntax IGNORE INDEX (index_list) can be used to tell MySQL to not use some particular index or indexes. These hints are useful if EXPLAIN shows that MySQL is using the wrong index from the list of possible indexes.
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Each hint requires the names of indexes, not the names of columns. The name of a PRIMARY KEY is PRIMARY. To see the index names for a table, use SHOW INDEX.
From http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Index for the group by (=implicit order by)
...
GROUP BY tc.id
The group by does an implicit sort on tc.id.
tc.id is not listed a a possible key.
but t.transaction_id is.
Change the code to
SELECT t.id
FROM account_transaction t
JOIN transaction_code tc ON t.transaction_code_id = tc.id
JOIN account a ON t.account_number = a.account_number
GROUP BY t.transaction_code_id
This will put the potential index transaction_code_id into view.
Indexes for the joins
If the joins (nearly) fully join the three tables, there's no need to use the index, so MySQL doesn't.
Other reasons for not using an index
If a large % of the rows under consideration (40% IIRC) are filled with the same value. MySQL does not use an index. (because not using the index is faster)