I have the following query:
select min(a) from tbl where b > ?;
and it takes about 4 seconds on my mysql instance with index(b, a) (15M rows). Is there a way to speed it up?
Explain:
explain select min(parsed_id) from replays where game_date > '2016-10-01';
id: 1
select_type: SIMPLE
table: replays
partitions: NULL
type: range
possible_keys: replays_game_date_index,replays_game_date_parsed_id_index
key: replays_game_date_parsed_id_index
key_len: 6
ref: NULL
rows: 6854021
filtered: 100.00
Extra: Using where; Using index
Index statement:
create index replays_game_date_parsed_id_index on replays (game_date, parsed_id);
I think the index MySQL is using is the right one. The query should be instantaneous since a SINGLE read from the index should return the result you want. I guess for this query MySQL's SQL optimizer is doing a very poor job.
Maybe you could rephrase your query to trick the SQL optimizer onto using a different strategy. Maybe you can try:
select parsed_id
from replays
where game_date > '2016-10-01'
order by parsed_id
limit 1
Is this version any faster?
select #mina
fro (select (#mina := least(#mina, a)) as mina
from tbl cross join
(select #mina := 999999) params
where b > ?
) t
limit 1;
I suspect this won't make much difference, but I'm not sure what happens under the hood with such a large aggregation function running over an index.
This may or may not help: Change the query and add an index:
SELECT a FROM tbl WHERE b > ? ORDER BY a LIMIT 1;
INDEX(a, b)
Then, if a matching b occurs soon enough in the table, this will be faster than the other suggestions.
On the other hand, if the only matching b is near the end of the table, this will have to scan nearly all the index and be slower than the other options.
a needs to be first in the index. By having both columns in the index, it becomes a "covering" index, hence a bit faster.
It may be that using my SELECT, together with two indexes will give the Optimizer enough to pick the better approach:
INDEX(a,b)
INDEX(b,a)
Schema
Adding either (or both) composite indexes should help.
Shrinking the table size is likely to help...
INT takes 4 bytes. Consider whether a smaller datatype would suffice for any of those columns.
There are 3 dates (DATETIME, TIMESTAMP); do you need all of them?
Is fingerprint varchar(36) a UUID/GUID? If so, it could be packed into BINARY(16).
640MB is tight -- check the graphs to make sure there is no "swapping". (Swapping would be really bad for performance.)
Related
I am using MySQL 5.6 and try to optimize next query:
SELECT t1.field1,
...
t1.field30,
t2.field1
FROM Table1 t1
JOIN Table2 t2 ON t1.fk_int = t2.pk_int
WHERE t1.int_field = ?
AND t1.enum_filed != 'value'
ORDER BY t1.created_datetime desc;
A response can contain millions of records and every row consists of 31 columns.
Now EXPLAIN says in Extra that planner uses 'Using where'.
I tried to add next index:
create index test_idx ON Table1 (int_field, enum_filed, created_datetime, fk_int);
After that EXPLAIN says in Extra that planner uses "Using index condition; Using filesort"
"rows" value from EXPLAIN with index is less than without it. But in practice time of execution is longer.
So, the questions are next:
What is the best index for this query?
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
Should the fields from ORDER BY be in index?
Should the fields from JOIN be in index?
try refactor your index this way
avoid (o move to the right after fk_int) the created_datetime column.. and move fk_int before the enum_filed column .. the in this wahy the 3 more colums used for filter shold be use better )
create index test_idx ON Table1 (int_field, fk_int, enum_filed);
be sure you have also an specific index on table2 column pk_int. if you have not add
create index test_idx ON Table2 (int_field, fk_int, enum_filed);
What is the best index for this query?
Maybe (int_field, created_datetime) (See next Q&A for reason.)
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
enum_filed != defeats the optimizer. If there is only one other value for that enum (and it is NOT NULL), then use = and the other value. And try INDEX(int_field, enum_field, created_datetime) The Optimizer is much happier with = than with any inequality.
"5" could be indicating 2 columns, or it could be indicating one INT that is Nullable. If int_field can be NULL, consider changing it to NOT NULL; then the "5" would drop to "4".
Should the fields from ORDER BY be in index?
Only if the index can completely handle the WHERE. This usually occurs only if all the WHERE tests are =. (Hence, my previous answer.)
Another case for including those columns is "covering"; see next Q&A.
Should the fields from JOIN be in index?
It depends. One thing that gives some performance benefit is to include all columns mentioned anywhere in the SELECT. This is called a "covering" index and is indicated in EXPLAIN by Using index (not Using index condition). There are too many columns in t1 to add a "covering" index. I think the practical limit is about 5 columns.
My guess for your question № 1:
create index my_idx on Table1(int_field, created_datetime desc, fk_int)
or one of these (but neither will probably be worthwhile):
create index my_idx on Table1(int_field, created_datetime desc, enum_filed, fk_int)
create index my_idx on Table1(int_field, created_datetime desc, fk_int, enum_filed)
I'm supposing 3 things:
Table2.pk_int is already a primary key, judging by the name
The where condition on Table1.int_field is only satisfied by a small subset of Table1
The inequality on Table1.enum_filed (I would fix the typo, if I were you) only excludes a small subset of Table1
Question № 2: the key_len refers to the keys used. Don't forget that there is one extra byte for nullable keys. In your case, if int_field is nullable, it means that this is the only key used, otherwise both int_field and enum_filed are used.
As for questions 3 and 4: If, as I suppose, it's more efficient to start the query plan from the where condition on Table1.int_field, the composite index, in this case also with the correct sort order (desc), enables a scan of the index to get the output rows in the correct order, without an extra sort step. Furthermore, adding also fk_int to the index makes the retrieval of any record of Table1 unnecessary unless a corresponding record is present in Table2. For a similar reason you could also add enum_filed to the index, but, if this doesn't considerably reduce the output record count, the increase in index size will make things worse instead of better. In the end, you will have to try it out (with realistic data!).
Note that if you put another column between int_field and created_datetime in the index, the index won't provide the created_datetime (for a given int_field) in the desired output order.
The issue was fixed by adding more filters (to where clause) to the query.
Regarding to indexes 2 proposed indexes were helpful:
From #WalterTross with next index for initial query:
(int_field, created_datetime desc, enum_filed, fk_int)
With my short comment: desc indexes is not supported at MySQL 5.6 - this key word just reserved.
From #RickJames with next index for modified query:
(int_field, created_datetime)
Thanks everyone who tried to help. I really appreciate it.
I am running a basic select on a table with 189,000 records. The table structure is:
items
id - primary key
ad_count - int, indexed
company_id - varchar, indexed
timestamps
the select query is:
select *
from `items`
where `company_id` is not null
and `ad_count` <= 100
order by `ad_count` desc, `items`.`id` asc
limit 50
On my production servers, just the MySQL portion of the execution takes 300 - 400ms
If I run an explain, I get:
select type: SIMPLE
table: items
type: range
possible_keys: items_company_id_index,items_ad_count_index
key: items_company_id_index
key_len: 403
ref: NULL
rows: 94735
Extra: Using index condition; Using where; Using filesort
When fetching this data in our application, we paginate it groups of 50, but the above query is "the first page"
I'm not too familiar with dissecting explain queries. Is there something I'm missing here?
An ORDER BY clause with different sorting order can cause the creation of temporary tables and filesort. MySQL below (and including) v5.7 doesn't handle such scenarios well at all, and there is actually no point in indexing the fields in the ORDER BY clause, as MySQL's optimizer will never use them.
Therefore, if the application's requirements allow, it's best to use the same order for all columns in the ORDER BY clause.
So in this case:
order by `ad_count` desc, `items`.`id` asc
Will become:
order by `ad_count` desc, `items`.`id` desc
P.S, as a small tip to read more about - it seems that MySQL 8.0 is going to change things and these use cases might perform significantly better when it's released.
Try replacing items_company_id_index with a multi-column index on (company_id, ad_count).
DROP INDEX items_company_id_index ON items;
CREATE INDEX items_company_id_ad_count_index ON items (company_id, ad_count);
This will allow it to use the index to test both conditions in the WHERE clause. Currently, it's using the index just to find non-null company_id, and then doing a full scan of those records to test ad_count. If most records have non-null company_id, it's scanning most of the table.
You don't need to retain the old index on just the company_id column, because a multi-column index is also an index on any prefix columns, because of the way B-trees work.
I could be wrong here (depending on your sql version this could be faster) but try a Inner Join with your company table.
Like:
Select *
From items
INNER JOIN companies ON companies.id = items.company_id
and items.ad_count <= 100
LIMIT 50;
because of your high indexcount building the btrees will slow down the database each time a new entry is inserted. Maybe remove the index of ad_count?! (this depends on how often you use that entry for queries)
The following query takes 0.7s on a 2.5Ghz dual core Windows Server 2008 R2 Enterprise when run over a 4.5Gb MySql database. sIndex10 is a varchar(1024) column type:
SELECT COUNT(*) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
An EXPLAIN shows the following information:
id: 1
select_type: SIMPLE
table: e_entity
type: ref
possible_keys: App_Parent,sIndex10
key: App_Parent
key_len: 4
ref: const
rows: 270066
extra: Using Where
There are 230060 rows matching the first condition and 124216 rows matching the clause with AND operator. meta_oid is indexed, and although sIndex10 is also indexed I think it's correctly not picking up this index as a FORCE INDEX (sIndex10), takes longer.
We have had a look at configuration parameters such as innodb_buffer_pool_size and they look correct as well.
Given that this table has already 642532 records, have we reached the top of the performance mysql can offer? Is it at this point investing in hardware the only way forward?
WHERE meta_oid=336799 AND sIndex10 = ''
begs for a composite index
INDEX(meta_oid, sIndex10) -- in either order
That is not the same as having separate indexes on the columns.
That's all there is to it.
Index Cookbook
One thing I alway do is just count(id) since id is (nearly) always indexed, counting just the id only has to look at the index.
So try running and see if it performs any better. You should also add SQL_NO_CACHE when testing to get a better idea of how the query performs.
SELECT SQL_NO_CACHE COUNT(id) FROM e_entity
WHERE meta_oid=336799 AND sIndex10 = ''
Note: This is probably not the complete answer for your question, but it was too long for just a comment.
I am trying learn how to optimize SQL statements and I was wondering if it's possible to estimate what might be making my queries slow just by seeing the execution plan.
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 382856
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: rf
type: ref
possible_keys: rec_id
key: rec_id
key_len: 4
ref: rs.id
rows: 7
Extra: Using index condition
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: f
type: range
possible_keys: facet_name_and_value,rec_id
key: facet_name_and_value
key_len: 309
ref: NULL
rows: 382856
Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 4. row ***************************
id: 2
select_type: DERIVED
table: r
type: ref
possible_keys: record_id
key: record_id
key_len: 9
ref: sqlse_test_crescentbconflate.f.rec_id
rows: 1
Extra: Using where; Using index
Just by looking at the execution plan I can see that I am using too many joins and the data is too big since SQL is using filesort, but I might be wrong.
No, it's not really possible to diagnose the performance issue from just the EXPLAIN output.
But the output does reveal that there's a view query that's returning (an estimated) 384,000 rows. We can't tell if that's a stored view, or an inline view. But we can see that that results from that query are being materialized into a table (MySQL calls it a "derived table"), and then the outer query is running against that. The overhead for that can be considerable.
What we can't tell if it's possible to get the same result without the view, to flatten the query. And if that's not possible, whether there are any predicates on the outer query that could be pushed down into the view.
A "Using filesort" isn't necessarily a bad thing. But that operation can become expensive for really large sets. So we do want to avoid unnecessary sort operations. (What we can't tell from the EXPLAIN output is whether it would be possible to avoid those sort operations.)
And if the query uses a "covering index" then the query is satisfied from the index pages, without needing to lookup/visit pages in the underlying table, which means less work to do.
Also, make sure the predicates are in a form that enables effective use of an index. That means having conditions on bare columns, not wrapping the columns in functions. e.g.
We want to avoid writing a condition like this:
where DATE_FORMAT(t.dt,'%Y-%m') = '2016-01'
when the same thing can be expressed like this:
where t.dt >= '2016-01-01' and t.dt < '2016-02-01'
With the former, MySQL has to evaluate the DATE_FORMAT function for every row in the table, and the compare the return from the function. With the latter form, MySQL could use a "range scan" operation on an index with dt as the leading column. A range scan operation has the potential to eliminate vast swaths of rows very efficiently, without actually needing to examine the rows.
To summarize, the biggest performance improvements would likely come from
avoiding creating a derived table (no view definitions)
pushing predicates into view definitions (where view definitions can't be avoided)
avoiding unnecessary sort operations
avoiding unnecessary joins
writing predicates in a form that can make use of suitable indexes
creating suitable indexes, covering indexes where appropriate
I'd look at the extra field in the execution plan, and then examine your query and your database schema to find ways to improve performance.
using temporary means a temporary table was used, which may slow down the query. Furthermore, temporary tables may end up being written to the disk (and not stored in RAM, which the server typically tries to do if it can) if they are too large.
According the MySQL 5.5 documentation, here are some reasons
temporary tables are created:
Evaluation of UNION statements.
Evaluation of some views, such those that use the TEMPTABLE algorithm, UNION, or aggregation.
Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY
contains columns from tables other than the first table in the join
queue.
Evaluation of DISTINCT combined with ORDER BY may require a temporary table.
For queries that use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements
(described later) that require on-disk storage.
Evaluation of multiple-table UPDATE statements.
Evaluation of GROUP_CONCAT() or COUNT(DISTINCT) expressions.
Then there's using filesort, which means that a sort was performed which could not be done with existing indexes. This could be no big deal, but you should check what fields are being sorted on and where your indexes are and make sure you're not giving MySQL too much work to do.
You may be able to use the execution plan to see why your queries run slowly because you know how your schema works (what columns and indexes you have). But, we here on Stack Overflow can't possibly use just the execution plan to help you.
There's nothing inherently wrong with filesort. It happens to have an unfortunate name; it simply means that satisfying the query requires sorting the results of a subquery. It doesn't necessarily mean the subquery's results have been placed in an actual file in a filesystem.
Try reading this fine tutorial. http://use-the-index-luke.com/
If you need help with a specific query, please ask another question. Include the following information:
The query.
The results of EXPLAIN
The definitions of the tables involved in the query, including indexes.
Pro tip: SELECT * is harmful to performance in big queries with lots of joins. In particular,
SELECT *
FROM gigantic_table
ORDER BY column
LIMIT 1
is an antipattern, because it slurps a huge amount of data, sorts it, and then discards all but one row of of the sorted result. Lots of data gets sloshed around in your server for a small result. That's wasteful, even if it is correct. You can do this kind of thing more efficiently with
SELECT *
FROM gigantic_table
WHERE column =
(SELECT MAX(column) FROM gigantic_table)
The best efficiency will come if column is indexed.
I mention this because the first row of your explain makes it look like you're romping through a great many rows to find something.
Let's say we have a common join like the one below:
EXPLAIN SELECT *
FROM visited_links vl
JOIN device_tracker dt ON ( dt.Client_id = vl.Client_id
AND dt.Device_id = vl.Device_id )
GROUP BY dt.id
if we execute an explain, it says:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE vl index NULL vl_id 273 NULL 1977 Using index; Using temporary; Using filesort
1 SIMPLE dt ref Device_id,Device_id_2 Device_id 257 datumprotect.vl.device_id 4 Using where
I know that sometimes it's difficult to choose the right indexes when you are using group by but, what indexes could I set for avoiding 'using temporary, using filesort' in this query? why is this happening? and specially, why this happens after using an index?
One point to mention is that the fields returned by the select (* in this case) should either be in the GROUP BY clause or be using agregate functions such as SUM() or MAX(). Otherwise unexpected results can occur. This is because if the database is not told how to choose fields that are not in the group by clause you may get any member of the group, pretty much at random.
The way I look at it is to break the query down into bits.
you have a join on (dt.Client_id = vl.Client_id and dt.Device_id = vl.Device_id) so all of those fields should be indexed in their respective tables.
You are using GROUP BY dt.id so you need an index that includes dt.id
BUT...
an index on (dt.client_id,dt.device_id,dt.id) will not work for the GROUP BY
and
an index on (dt.id, dt.client_id, dt.device_id) will not work for the join.
Sometimes you end up with a query which just can't use an index.
See also:
http://ntsrikanth.blogspot.com/2007/11/sql-query-order-of-execution.html
You didn't post your indices, but first of all, you'll want to have an index for (client_id, device_id) on visited_links, and (client_id, device_id, id) on device_tracker to make sure that query is fully indexed.
From page 191 of the excellent High Performance MySQL, 2nd Ed.:
MySQL has two kinds of GROUP BY strategies when it can't use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient depending on the query. You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.
In your case, I think the issue stems from joining on multiple columns and using GROUP BY together, even after the suggested indices are in place. If you remove either (a) one of the join conditions or (b) the GROUP BY, this shouldn't need a filesort.
However, keep in mind that a filesort doesn't always use actual files, it can also happen entirely within a memory buffer if the result set is small enough, so the performance penalty may be minimal. Consider the wall-clock time for the query too.
HTH!