MySQL simple but slow query (wrong indexes?) - mysql

i'm working on a simple query that runs in about 1.2 seconds in a myisam table populated by 126,000 records:
SELECT * FROM my_table
WHERE primary_key != 5 AND
(
col1 = 528 OR (col2 = 265 AND col3 = 1)
)
ORDER BY primary_key DESC
I have already created single indexes for each field used in the where clause, but only primary_key (autoincrement field of my_table) is used as key while col1 and col2 are just ignored and the query becomes much slower. How should I create the indexes (maybe multiple-indexs) or edit the query?

You will get optimal performance if you have the following multi-column "covering" indexes:
(primary_key, col1)
(primary_key, col2, col3)
And issue the following query:
(SELECT * FROM my_table
WHERE primary_key != 5 AND
col1 = 528)
UNION
(SELECT * FROM my_table
WHERE primary_key != 5 AND
col2 = 265 AND col3 = 1)
ORDER BY primary_key DESC
You may get variable, better, performance by changing the order of the fields in the indexes, based on cardinality.
In your original query, no index could be used for the entire selection in the WHERE clause, which caused partial table scanning.
In the above query, the first subquery is able to utilize the first index completely, avoiding any table scanning, and the second subquery uses the second index.
Unfortunately, MySQL won't be able to utilize an index to sort the records on the full result set, and will probably use filesort to order them. So, if you don't need the records ordered by primary_key, remove the outer ORDER clause for better performance, though if the result set is small, it shouldn't be an issue either way.

Use EXPLAIN to find out what is going on. This will identify what indexes are being used and hence enable you to tune the query.

Unfortunately, there is almost no way to predict which indexes will work better than others without a thorough understanding of MySQL internals. However, for each query (or, at least sub-query/join) MySQL will only use 1 index, you have stated that the primary key is being used, so I assume you have looked at the EXPLAIN output.
You will likely want to try multi-column indexes on either all of (primary_key, col1, col2, col3) or a subset of these, in different orders to find the best result. The best index will depend on the primarily on the cardinality of the columns and thus, may even change over time depending on the data in the table.

Related

query timeout because of one always valid primary key condition in select mysql

I have a 40M record table having id as the primary key. I execute a select statement as follows:
select * from messages where (some condition) order by id desc limit 20;
It is ok and the query executes in a reasonable time. But when I add an always valid condition as follows, It takes a huge time.
select * from messages where id > 0 and (some condition) order by id desc limit 20;
I guess it is a bug and makes MySQL search from the top side of the table instead of the bottom side. If there is any other justification or optimization it would be great a help.
p.s. with a high probability, the results are found in the last 10% records of my table.
p.p.s. the some condition is like where col1 = x1 and col2 = x2 where col1 and col2 are indexed.
MySQL has to choose whether to use an index to process the WHERE clause, or use an index to control ORDER BY ... LIMIT ....
In the first query, the WHERE clause can't make effective use of an index, so it prefers to use the primary key index to optimize scanning in order by ORDER BY. In this case it stops when it finds 20 results that satisfy the WHERE condition.
In the second query, the id > 0 condition in the WHERE clause can make use of the index, so it prefers to use that instead of using the index for ORDER BY. In this case, it has to find all the results that match the WHERE condition, and then sort them by id.
I wouldn't really call this a bug, as there's no specification of precisely how a query should be optimized. It's not always easy for the query planner to determine the best way to make use of indexes. Using the index to filter the rows using WHERE id > x could be better if there aren't many rows that match that condition.
A query like this
select *
from messages
where col1 = x1
and col2 = x2
order by id desc
limit 20;
is best handled by a 'composite' index with the tests for '=' first:
INDEX(col1, col2, id)
Try it, I think it will be faster than either of the queries you are working with.
It’s not a bug. You are searching through a 40 million table where your where clause doesn’t have an index. Add an index on the column in your where clause and you will see substantial improvement.

MySQL query index

I am using MySQL 5.6 and try to optimize next query:
SELECT t1.field1,
...
t1.field30,
t2.field1
FROM Table1 t1
JOIN Table2 t2 ON t1.fk_int = t2.pk_int
WHERE t1.int_field = ?
AND t1.enum_filed != 'value'
ORDER BY t1.created_datetime desc;
A response can contain millions of records and every row consists of 31 columns.
Now EXPLAIN says in Extra that planner uses 'Using where'.
I tried to add next index:
create index test_idx ON Table1 (int_field, enum_filed, created_datetime, fk_int);
After that EXPLAIN says in Extra that planner uses "Using index condition; Using filesort"
"rows" value from EXPLAIN with index is less than without it. But in practice time of execution is longer.
So, the questions are next:
What is the best index for this query?
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
Should the fields from ORDER BY be in index?
Should the fields from JOIN be in index?
try refactor your index this way
avoid (o move to the right after fk_int) the created_datetime column.. and move fk_int before the enum_filed column .. the in this wahy the 3 more colums used for filter shold be use better )
create index test_idx ON Table1 (int_field, fk_int, enum_filed);
be sure you have also an specific index on table2 column pk_int. if you have not add
create index test_idx ON Table2 (int_field, fk_int, enum_filed);
What is the best index for this query?
Maybe (int_field, created_datetime) (See next Q&A for reason.)
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
enum_filed != defeats the optimizer. If there is only one other value for that enum (and it is NOT NULL), then use = and the other value. And try INDEX(int_field, enum_field, created_datetime) The Optimizer is much happier with = than with any inequality.
"5" could be indicating 2 columns, or it could be indicating one INT that is Nullable. If int_field can be NULL, consider changing it to NOT NULL; then the "5" would drop to "4".
Should the fields from ORDER BY be in index?
Only if the index can completely handle the WHERE. This usually occurs only if all the WHERE tests are =. (Hence, my previous answer.)
Another case for including those columns is "covering"; see next Q&A.
Should the fields from JOIN be in index?
It depends. One thing that gives some performance benefit is to include all columns mentioned anywhere in the SELECT. This is called a "covering" index and is indicated in EXPLAIN by Using index (not Using index condition). There are too many columns in t1 to add a "covering" index. I think the practical limit is about 5 columns.
My guess for your question № 1:
create index my_idx on Table1(int_field, created_datetime desc, fk_int)
or one of these (but neither will probably be worthwhile):
create index my_idx on Table1(int_field, created_datetime desc, enum_filed, fk_int)
create index my_idx on Table1(int_field, created_datetime desc, fk_int, enum_filed)
I'm supposing 3 things:
Table2.pk_int is already a primary key, judging by the name
The where condition on Table1.int_field is only satisfied by a small subset of Table1
The inequality on Table1.enum_filed (I would fix the typo, if I were you) only excludes a small subset of Table1
Question № 2: the key_len refers to the keys used. Don't forget that there is one extra byte for nullable keys. In your case, if int_field is nullable, it means that this is the only key used, otherwise both int_field and enum_filed are used.
As for questions 3 and 4: If, as I suppose, it's more efficient to start the query plan from the where condition on Table1.int_field, the composite index, in this case also with the correct sort order (desc), enables a scan of the index to get the output rows in the correct order, without an extra sort step. Furthermore, adding also fk_int to the index makes the retrieval of any record of Table1 unnecessary unless a corresponding record is present in Table2. For a similar reason you could also add enum_filed to the index, but, if this doesn't considerably reduce the output record count, the increase in index size will make things worse instead of better. In the end, you will have to try it out (with realistic data!).
Note that if you put another column between int_field and created_datetime in the index, the index won't provide the created_datetime (for a given int_field) in the desired output order.
The issue was fixed by adding more filters (to where clause) to the query.
Regarding to indexes 2 proposed indexes were helpful:
From #WalterTross with next index for initial query:
(int_field, created_datetime desc, enum_filed, fk_int)
With my short comment: desc indexes is not supported at MySQL 5.6 - this key word just reserved.
From #RickJames with next index for modified query:
(int_field, created_datetime)
Thanks everyone who tried to help. I really appreciate it.

Why does Mysql decide to use an index on column specified in Order By clause when its not present in where clause?

Why does Mysql decide to use an index on column specified in Order By clause although that column is not present in where clause ?
This happens when Order By + Limit clause are used together in the query.
Example query:
select col1, col2,col3 from table_name where col1 = 'x' and col3='y' order by colY limit 3;
table_name has 9M records
In the absence of limit clause,
mysql uses the index on col1 column which is wayy faster.
Better
select col1, col2,col3
from table_name
where col1 = 'x'
and col3 = 'y'
order by col4
limit 3;
The optimal index is one of these two:
INDEX(col1, col3, col4)
INDEX(col3, col1, col4)
In both, the Optimizer can completely resolve the WHERE and do the ORDER BY and even stop after 3 rows due to the LIMIT.
Best. Even better performance would come from adding col2 to the end of either. This makes it a "covering" index, so all the work can be done in the index's BTree without touching the data's BTree.
Back to your question
If you don't have one of those indexes, the Optimizer is in a quandary, and often picks the wrong of the two likely choices. Let's say you have only
INDEX(col1), INDEX(col4)
Plan A focuses on filtering: Use col1, but have to sort all the matching rows before peeling off 3. But it might get a million rows and have to sort them.
Plan B avoids sorting: Scan through the index in col4 order. If it is really lucky, the first 3 rows will match the WHERE clause. If it is really unlucky, it will scan the entire table without finding 3 acceptable rows. But they will be sorted!
The "statistics" are meager, and cannot realistically decide between the two choices.
Either Plan could be really slow.
Similar problems occur with JOINs with the WHERE clause filtering on both tables.

Can MySQL use Indexes when there is OR between conditions?

I have two queries plus its own EXPLAIN's results:
One:
SELECT *
FROM notifications
WHERE id = 5204 OR seen = 3
Benchmark (for 10,000 rows): 0.861
Two:
SELECT h.* FROM ((SELECT n.* from notifications n WHERE id = 5204)
UNION ALL
(SELECT n.* from notifications n WHERE seen = 3)) h
Benchmark (for 10,000 rows): 2.064
The result of two queries above is identical. Also I have these two indexes on notifications table:
notifications(id) -- this is PK
notification(seen)
As you know, OR usually prevents effective use of indexes, That's why I wrote second query (by UNION). But after some tests I figured it out which still using OR is much faster that using UNION. So I'm confused and I really cannot choose the best option in my case.
Based on some logical and reasonable explanations, using union is better, but the result of benchmark says using OR is better. May you please help me should I use which approach?
The query plan for the OR case appears to indicate that MySQL is indeed using indexes, so evidently yes, it can do, at least in this case. That seems entirely reasonable, because there is an index on seen, and id is the PK.
Based on some logical and reasonable explanations, using union is better, but the result of benchmark says using OR is better.
If "logical and reasonable explanations" are contradicted by reality, then it is safe to assume that the logic is flawed or the explanations are wrong or inapplicable. Performance is notoriously difficult to predict; performance testing is essential where speed is important.
May you please help me should I use which approach?
You should use the one that tests faster on input that adequately models that which the program will see in real use.
Note also, however, that your two queries are not semantically equivalent: if the row with id = 5204 also has seen = 3 then the OR query will return it once, but the UNION ALL query will return it twice. It is pointless to choose between correct code and incorrect code on any basis other than which one is correct.
index_merge, as the name suggests, combines the primary keys of two indexes using the Sort Merge Join or Sort Merge Union for AND and OR conditions, appropriately, and then looks up the rest of the values in the table by PK.
For this to work, conditions on both indexes should be so that each index would yield primary keys in order (your conditions are).
You can find the strict definition of the conditions in the docs, but in a nutshell, you should filter by all parts of the index with an equality condition, plus possibly <, =, or > on the PK.
If you have an index on (col1, col2, col3), this should be col1 = :val1 AND col2 = :val2 AND col3 = :val3 [ AND id > :id ] (the part in the square brackets is not necessary).
The following conditions won't work:
col1 = :val1 -- you omit col2 and col3
col1 = :val1 AND col2 = :val2 AND col3 > :val3 -- you can only use equality on key parts
As a free side effect, your output is sorted by id.
You could achieve the similar results using this:
SELECT *
FROM (
SELECT 5204 id
UNION ALL
SELECT id
FROM mytable
WHERE seen = 3
AND id <> 5204
) q
JOIN mytable m
ON m.id = q.id
, except that in earlier versions of MySQL the derived table would have to be materialized which would definitely make the query performance worse, and your results would not have been ordered by id anymore.
In short, if your query allows index_merge(union), go for it.
The answer is contained in your question. The EXPLAIN output for OR says Using union(PRIMARY, seen) - that means that the index_merge optimization is being used and the query is actually executed by unioning results from the two indexes.
So MySQL can use index in some cases and it does in this one. But the index_merge is not always available or is not used because the statistics of the indexes say it won't be worth it. In those cases OR may be a lot slower than UNION (or not, you need to always check both versions if you are not sure).
In your test you "got lucky" and MySQL did the right optimization for you automatically. It is not always so.

Is search on index with less verity faster?

I have a table with 3 index column on it lets saye col1, col2, col3 they are all int
My friend said that, if column2 has less verity of data (I mean it has only 12,13,14 and other column have like random numbers), it is faster to put your condition for the WHERE clause on that column first, because mysql will begin to get populate data from that point first!
so basically he says that the second is faster
select * from my_table where col1=1 and col2=2 and col3=3
select * from my_table where col2=2 and col1=1 and col3=3
is that true? I couldn't find any reading material on the subject.
While "verity" is a term applied to datasets, it has nothing to do with databases. I think you are talking about cardinality.
The order in which you declare predicates in your query has no impact at all on how the optimiser resolves the query. You can easily test this yourself using 'explain'.
The order of columns in an index does have a big impact on performance