For a table with two non-null columns: id (primary), and date (indexed), I get the following entry in the mysql-slow log.
# Query_time: 16.316747 Lock_time: 0.000049 Rows_sent: 1 Rows_examined: 616021
SET timestamp=1451837371;
select max(date) from mytable where id<896173;
I ran EXPLAIN on this query, and this is the outcome.
id = 1
select_type = SIMPLE
table = mytable
type = range
possible_keys = PRIMARY
key = PRIMARY
key_len = 4
ref = NULL
rows = 337499
Extra = Using where
I tried to edit the date index to add the id column to it. However, it is still the case that the number of rows examined is high. What can I do to reduce this number?
The engine needs to look at all rows where id<896173 and select the max(date) from that. Having an index on date and an index on id does not really help. Either MySQL can use the index on date to identify only a subset of rows.
However, that subset is big enough that it will be faster to read all the rows (with sequential access) than it would be to read only a subset (with random access).
I suggest you using an index more selective like the inverse of your
use an index based on id, date
in this way the id drive the selection and the date fiels support the selection.
Related
I am trying to understand why MySQL is not using the complete index to answer the query. Let me explain.
I am using imdb database through MySQL version 5.1.73-1 (Debian). I created and index "itid_inf25_mid_ndx" on the table movie_info_idx with columns (info_type_id, info(25), movie_id). Columns info_type_id and movie_id are integer (NOT NULL) and info is TEXT type, so each index entry takes 4+27+4 = 35 bytes. The output for sentence:
EXPLAIN
SELECT movie_id
FROM movie_info_idx
WHERE info_type_id = 101
AND info > "9";
shows these values:
select_type = SIMPLE; table = movie_info_idx; type=range;
possible_keys = itid_inf25_mid_ndx; key = itid_inf25_mid_ndx; key_len
= 31; ref = NULL; rows = 841; Extra = "Using where"
The key_len column and no "using index" in column extra are informing that just the columns (info_type_id,info(25)) which sum up 4+27 = 31 bytes are using from the index. I wonder why the optimizer is not using the column movie_id from the index in order to access the movie_id in the SELECT clause? It seems that the optimizer will access the base table movie_info_idx to take the movie_id value I want to list. Why?.
Thank you in advance for your reply.
Once MySQL uses a query for a "range scan" (matching more than one value) it will generally no longer user the last column.
Reason for this is that multi-column indexes are a tree of trees. In order to scan the index on the last column (movie_id) it has to search an index tree for every matching value of the range column (info). This is generally inefficient and so MySQL won't do it.
To improve the situation, put the column expected to be the range scan last, so order it as (info_type_id, movie_id, info)
More info:
https://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
I have a table in MySQL (5.5.31) which has about 20M rows. The following query:
SELECT DISTINCT mytable.name name FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0 ORDER BY mytable.date_modified DESC LIMIT 0,21
is causing full table scan, with explain saying type is ALL and extra info is Using where; Using temporary; Using filesort. Explain results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable ALL NULL NULL NULL NULL 19001156 Using where; Using temporary; Using filesort
1 SIMPLE mytable_c eq_ref PRIMARY PRIMARY 108 mytable.id 1 Using index
Without the join explain looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable index NULL mytablemod 9 NULL 21 Using where; Using temporary
id_c is the primary key for mytable_c and mytable_c does not have more than one row for every row in mytable. date_modified is indexed. But looks like MySQL does not understand that. If I remove the DISTINCT clause, then explain uses index and touches only 21 rows just as expected. If I remove the join it also does this. Is there any way to make it work without the full table scan with the join? explain shows mysql knows it needs only one row from mytable_c and it is using the primary key, but still does full scan on mytable.
The reason DISTINCT is there that the query is generated by the ORM system in which there might be cases where multiple rows may be produced by JOINs, but the values of SELECT fields will always be unique (i.e. if JOIN is against multi-value link only fields that are the same in every joined row will be in SELECT).
These are just generic comments, not mysql specific.
To find all the possible name values from mytable a full scan of either the table or an index needs to happen. Possible options:
full table scan
full index scan of an index starting with deleted (take advantage of the filter)
full index scan of an index starting with name (only column of concern for output)
If there was an index on deleted, the server could find all the deleted = 0 index entries and then look up the corresponding name value from the table. But if deleted has low cardinality or the statistics aren't there to say differently, it could be more expensive to do the double reads of first the index then the corresponding data item. In that case, just scan the table.
If there was an index on name, an index scan could be sufficient, but then the table needs to be checked for the filter. Again frequent hopping from index to table.
The join columns also need to be considered in a similar manner.
If you forget about the join part and had a multi-part index on columns name, deleted then an index scan would probably happen.
Update
To me the DISTINCT and ORDER BY parts are a bit confusing. Of which name record is the date_modified to be used for sorting? I think something like this would be a bit more clear:
SELECT mytable.name name --, MIN(mytable.date_modified)
FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0
GROUP BY mytable.name
ORDER BY MIN(mytable.date_modified) DESC LIMIT 0,21
Either way, once the ORDER BY comes into play, a full scan needs to be done to find the order. Without the ORDER BY, the first 21 found could suffice.
Why do not you try to move condition mytable.deleted = 0 from WHERE to the JOIN ON ? You can also try FORCE INDEX (mytablemod)
Using MySQL (5.1.66) explain says it will scan just 72 rows while the "slow log" reports the whole table was scanned (Rows_examined: 5476845)
How is this possible? I can't figure out what's wrong with the query
*name* is a string unique index and
*date* is just a regular int index
This is the EXPLAIN
EXPLAIN SELECT *
FROM table
WHERE name LIKE 'The%Query%'
ORDER BY date DESC
LIMIT 3;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table index name date 4 NULL 72 Using where
Output from Slow Log
# Query_time: 5.545731 Lock_time: 0.000083 Rows_sent: 1 Rows_examined: 5476845
SET timestamp=1360007079;
SELECT * FROM table WHERE name LIKE 'The%Query%' ORDER BY date DESC LIMIT 3;
The rows value that is returned from an EXPLAIN is an estimate of the number of rows that have to be examined to find results that match your query.
If you look, you will see that the key being chosen for the query execution is date, which is probably being picked because of your ORDER BY clause. Because the key being used in the query is unrelated to your WHERE clause, that's probably why the estimate is getting messed up. Even though your WHERE clause is doing a LIKE on the name column, the optimizer may decide not to use an index at all:
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.) source
In short, the optimizer is choosing not to use the name key, even though it would be the one that is the limiting factor of rows to be returned. You can try forcing the index to see if that improves the performance.
I tried the SQL code:
explain SELECT * FROM myTable LIMIT 1
As a result I got:
id select_type table type possible_keys key key_len ref **rows**
1 SIMPLE myTable ALL NULL NULL NULL NULL **32117**
Do you know why the query would run though all rows instead of simply picking the first row?
What can I change within the query (or in my table) to reduce the line amount for a similar result?
The rows count shown is only an estimate of the number of rows to examine. It is not always equal to the actual number of rows examined when you run the query.
In particular:
LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number.
Source
When the query actually runs only one row will be examined.
Edited for use of subselect:
Assuming the primary key is "my_id" , use WHERE. For instance:
select * from mytable
where my_id = (
select max(my_id) from mytable
)
While this seems less efficient at first, the result is as such in explain, resulting in just one row returned and a read on the index to find max. I do not suggest doing this against partitioned tables in MySQL:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mytable const PRIMARY PRIMARY 4 const 1
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
I have a MySQL query :
SELECT date(FROM_UNIXTIME(time)) as date,
count(view) as views
FROM ('table_1')
WHERE 'address' = 1
GROUP BY date(FROM_UNIXTIME(time))
where
view : auto increment and primary key, (int(11))
address : index , (int(11))
time : index, (int(11))
total rows number of the table is : 270k
this query have slow executing, in mysql-slow.log I got :
Query_time: 1.839096
Lock_time: 0.000042
Rows_sent: 155
Rows_examined: 286435
with use EXPLAIN looks like below:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table_1 ref address address 5 const 139138 Using where; Using temporary; Using filesort
How to improve this query to speed up executing? Maybe better will be if I change date in PHP? But I think in PHP take a date as timestamp next convert to human readable and make "group by" will take more time then one query in MySQL. Maybe somebody knows how to make this query faster?
When you apply the functions date() and FROM_UNIXTIME() to the time in the group by you kill any indexing benefit you may have on that field.
Adding a date column would be the only way i can see speeding this up if you need it grouped by day. Without it, you'll need to decrase the overall set you are trying to group by. You could maybe add start/end dates to limit the date range. That would decrease the dates being transformed and grouped.
You should consider adding a additional DATE column to your table and indexing it.