MySQL Explain rows limit - mysql

Below is my query to get 20 rows with genre_id 1.
EXPLAIN SELECT * FROM (`content`)
WHERE `genre_id` = '1'
AND `category` = 1
LIMIT 20
I have total 654 rows in content table with genre_id 1, I have index on genre_id and in above query I am limiting result to display only 20 records which is working fine but explain is showing 654 records under rows, I tried to add index on category but still same result and then also I removed AND category = 1 but same rows count:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE content ref genre_id genre_id 4 const 654 Using where
HERE I found the answer
LIMIT is not taken into account while estimating number of rows Even
if you have LIMIT which restricts how many rows will be examined MySQL
will still print full number
But also In comments another reply was posted:
LIMIT is now taken into account when estimating number of rows. I’m
not sure which version addressed this, but in 5.1.30, EXPLAIN
accurately takes LIMIT into account.
I am using MySQL 5.5.16 with InnoDB. so as per above comment its still not taking into account. So my question is does mysql go through all 654 rows to return 20 rows even I have set limit? Thanks

Reply from Rick James at MySQL
Does mysql LIMIT is taken into account when estimating number of rows in Explain?
No. (5.7 with JSON may be a different matter.)

Related

MySQL Explain : how to calculate total examined rows?

I have an explain result below , problem is how to calculate the total examined rows? please explain this detail ~ (first ask question~ If there is any mistake, please correct me, I will be very grateful)
id select_type type possible_keys key_len rows
1 PRIMARY ALL 1423656
1 PRIMARY eq_ref PRIMARY 8 1
1 PRIMARY ref 152 1
1 PRIMARY ALL 138
1 PRIMARY ALL 1388
1 PRIMARY ALL 1564
3 DERIVED ALL 1684
3 DERIVED eq_ref PRIMARY 8 1
2 DERIVED ALL 141
From the manual : https://dev.mysql.com/doc/refman/5.7/en/explain-output.html
rows (JSON name: rows)
The rows column indicates the number of rows MySQL believes it must
examine to execute the query.
For InnoDB tables, this number is an estimate, and may not always be
exact.
You have a very high number of 1.4 million for one of your table but the possible_keys column is empty. That means this is a table that is desperately crying out to be indexed.
A large number of rows to be examined, means just that. Mysql needs to read all those rows to give you your result.
If you had posted your tables and your query, we could have helped you figure out what those indexes ought to be.

using range with composite key

A MySQL table contains the following two table tables (simplified):
(~13000) (~7000000 rows)
--------------- --------------------
| packages | | packages_prices |
--------------- --------------------
| id (int) |<- ->| package_id (int) |
| state (int) | | variant_id (int) |
- - - - - - - | for_date (date) |
| price (float) |
- - - - - - - - -
Each package_id/for_date combination has only a few (average 3) variants.
And state is 0 (inactive) or 1 (active). Around 4000 of the 13000 are active.
First I just want to know which packages have a price set (regardless of variation), so I add a composite key covering (1) for_date and (2) pid and I query:
select distinct package_id from packages_prices where for_date > date(now())
This query takes 1 seconds to return 3500 rows, which is too much. An Explain tells me that the composite key is used with key_len 3, and 2000000 rows are examined with 100% filtered with type range. Using where; Using index; Using temporary. The distinct takes it back to 3500 rows.
If I take out distinct, the Using temporary is not longer mentioned, but the query then returns 1000000 rows and still takes 1 seconds.
question 1 : why is this query so slow and how do I speed it up without having to add or change the columns in the table? I would expect that, given the composite key, this query should be able to cost less than 0,01s.
Now I want to know which active packages that have a price set.
So I add a key on state and I add a new composite key just like above, but in reverse order. And I write my query like this:
select distinct packages.id from packages
inner join packages_prices on id = package_id and for_date > date(now())
where state = 1
The query now takes 2 seconds. An Explain tells me for the packages table the key on state is used with key_len 4, examines 4000 rows and filters 100% type type ref. Using index; Using temporary. And for the packages_prices table the new composite key is used with key_len 4, examines 1000 rows and filters 33.33% with type ref. Using where; Using index; Distinct. The distinct takes it back to 3000 rows.
If I take out distinct, the Using temporary and Distinct are no longer mentioned, but the query return 850000 rows and takes 3 seconds.
question 2 : Why is the query that much slower now? Why is range no longer being used according to the Explain? And why has filtering with the new composite key dropped to 33.33%? I expected the composite key to filter 100% procent again.
This all seems very basic and trivial, but it has been costing me hours and hours and I still don't understand what's really going on under the hood.
Your observations are consistent with the way MySQL works. For your first query, using the index (for_date, package_id), MySQL will start at the specified date (using the index to find that position), but then has to go to the end of the index, because every next entry can reveal a yet unknown package_id. A specific package_id could e.g. have just been used on the latest for_date. That search will add up to your 2000000 examined rows. The relevant data is retrieved from the index, but it will still take time.
What to do about that?
With some creative rewriting, you can transform your query to the following code:
select package_id from packages_prices
group by package_id
having max(for_date) > date(now());
It will give you the same result as your first query: if there is at least one for_date > date(now()) (which will make it part of your resultset), that will be true for max(for_date) too. But this will only have to check one row per package_id (the one having max(for_date)), all other rows with for_date > date(now()) can be skipped.
MySQL will do that by using index for group-by-optimization (that text should be displayed in your explain). It will require the index (package_id, for_date) (that you already have) and only has to examine 13000 rows: Since the list is ordered, MySQL can jump directly to the last entry for each package_id, which will have the value for max(for_date); and then continue with the next package_id.
Actually, MySQL can use this method to optimize a distinct to (and will probably do that if you remove the condition on for_date), but is not always able to find a way; a really clever optimizer could have rewritten your query the same way I did, but we are not there yet.
And depending on your data distribution, that method could have been a bad idea: if you have e.g. 7000000 package_id, but only 20 of them in the future, checking each package_id for the maximum for_date will be much slower than just checking 20 rows that you can easily find by the index on for_date. So knowledge about your data will play an important role in choosing a better (and maybe optimal) strategy.
You can rewrite your second query in the same way. Unfortunately, such optimizations are not always easy to find and often specific to a specific query and situation. If you have a different distribution (as mentioned above) or if you e.g. slightly change your query and add an end-date, that method would not work anymore and you have to come up with another idea.

Why is MySQL slow when using LIMIT in my query?

I'm trying to figure out why is one of my query slow and how I can fix it but I'm a bit puzzled on my results.
I have an orders table with around 80 columns and 775179 rows and I'm doing the following request :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200
which returns 38 rows in 4.5s
When removing the ORDER BY I'm getting a nice improvement :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL LIMIT 200
38 rows in 0.30s
But when removing the LIMIT without touching the ORDER BY I'm getting an even better result :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC
38 rows in 0.10s (??)
Why is my LIMIT so hungry ?
GOING FURTHER
I was trying a few things before sending my answer and after noticing that I had an index on creation_date (which is a datetime) I removed it and the first query now runs in 0.10s. Why is that ?
EDIT
Good guess, I have indexes on the others columns part of the where.
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200;
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| 1 | SIMPLE | orders | index | id_state_idx,id_mp_idx | creation_date | 5 | NULL | 1719 | Using where |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC;
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| 1 | SIMPLE | orders | range | id_state_idx,id_mp_idx | id_mp_idx | 3 | NULL | 87502 | Using index condition; Using where; Using filesort |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
Indexes do not necessarily improve performance. To better understand what is happening, it would help if you included the explain for the different queries.
My best guess would be that you have an index in id_state or even id_state, id_mp that can be used to satisfy the where clause. If so, the first query without the order by would use this index. It should be pretty fast. Even without an index, this requires a sequential scan of the pages in the orders table, which can still be pretty fast.
Then when you add the index on creation_date, MySQL decides to use that index instead for the order by. This requires reading each row in the index, then fetching the corresponding data page to check the where conditions and return the columns (if there is a match). This reading is highly inefficient, because it is not in "page" order but rather as specified by the index. Random reads can be quite inefficient.
Worse, even though you have a limit, you still have to read the entire table because the entire result set is needed. Although you have saved a sort on 38 records, you have created a massively inefficient query.
By the way, this situation gets significantly worse if the orders table does not fit in available memory. Then you have a condition called "thrashing", where each new record tends to generate a new I/O read. So, if a page has 100 records on it, the page might have to be read 100 times.
You can make all these queries run faster by having an index on orders(id_state, id_mp, creation_date). The where clause will use the first two columns and the order by will use the last.
Same problem happened in my project,
I did some test, and found out that LIMIT is slow because of row lookups
See:
MySQL ORDER BY / LIMIT performance: late row lookups
So, the solution is:
(A)when using LIMIT, select not all columns, but only the PK columns
(B)Select all columns you need, and then join with the result set of (A)
SQL should likes:
SELECT
*
FROM
orders O1 <=== this is what you want
JOIN
(
SELECT
ID <== fetch the PK column only, this should be fast
FROM
orders
WHERE
[your query condition] <== filter record by condition
ORDER BY
[your order by condition] <== control the record order
LIMIT 2000, 50 <== filter record by paging condition
) as O2
ON
O1.ID = O2.ID
ORDER BY
[your order by condition] <== control the record order
in my DB,
the old SQL which select all columns using "LIMIT 21560, 20", costs about 4.484s.
the new sql costs only 0.063s. The new one is about 71 times faster
I had a similar issue on a table of 2.5 million records. Removing the limit part the query took a few seconds. With the limit part it stuck forever.
I solved with a subquery. In your case it would became:
SELECT *
FROM
(SELECT *
FROM orders
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC) tmp
LIMIT 200
I noted that the original query was fast when the number of selected rows was greater than the limit parameter. Se the query became extremely slow when the limit parameter was useless.
Another solution is trying forcing index. In your case you can try with
SELECT *
FROM orders force index (id_mp_idx)
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC
LIMIT 200
Problem is that mysql is forced to sort data on the fly. My query of deep offset like:
ORDER BY somecol LIMIT 99990, 10
Took 2.5s.
I fixed it by creating a new table, which has presorted data by column somecol and contains only ids, and there the deep offset (without need to use ORDER BY) takes 0.09s.
0.1s is not still enough fast though. 0.01s would be better.
I will end up creating a table that holds the page number as special indexed column, so instead of doing limit x, y i will query where page = Z.
i just tried it and it is fast as 0.0013. only problem is, that the offseting is based on static numbers (presorted in pages by 10 items for example.. its not that big problem though.. you can still get out any data of any pages.)

Performance difference between DISTINCT and GROUP BY

My understanding is that in (My)SQL a SELECT DISTINCT should do the same thing as a GROUP BY on all columns, except that GROUP BY does implicit sorting, so these two queries should be the same:
SELECT boardID,threadID FROM posts GROUP BY boardID,threadID ORDER BY NULL LIMIT 100;
SELECT DISTINCT boardID,threadID FROM posts LIMIT 100;
They're both giving me the same results, and they're giving identical output from EXPLAIN:
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
| 1 | SIMPLE | posts | ALL | NULL | NULL | NULL | NULL | 1263320 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
1 row in set
But on my table the query with DISTINCT consistently returns instantly and the one with GROUP BY takes about 4 seconds. I've disabled the query cache to test this.
There's 25 columns so I've also tried creating a separate table containing only the boardID and threadID columns, but the same problem and performance difference persists.
I have to use GROUP BY instead of DISTINCT so I can include additional columns without them being included in the evaluation of DISTINCT. So now I don't how to proceed. Why is there a difference?
First of all, your queries are not quite the same - GROUP BY has ORDER BY, but DISTINCT does not.
Note, that in either case, index is NOT used, and that cannot be good for performance.
I would suggest creating compound index for (boardid, threadid) - this should let both queries to make use of index and both should start working much faster
EDIT: Explanation why SELECT DISTINCT ... LIMIT 100 is faster than GROUP BY ... LIMIT 100 when you do not have indexes.
To execute first statement (SELECT DISTINCT) server only needs to fetch 100, maybe slightly more rows and can stop as soon as it has 100 different rows - no more work to do.
This is because original SQL statement did not specify any order, so server can deliver any 100 rows as it pleases, as long as they are distinct. But, if you were to impose any index-less ORDER BY on this before LIMIT 100, this query will immediately become slow.
To execute second statement (SELECT ... GROUP BY ... LIMIT 100), MySQL always does implicit ORDER BY by the same columns as were used in GROUP BY. In other words, it cannot quickly stop after fetching first few 100+ rows until all records are fetched, groupped and sorted. After that, it applies ORDER BY NULL you added (which does not do much I guess, but dropping it may speed things up), and finally, it gets first 100 rows and throws away remaining result. And of course, this is damn slow.
When you have compound index, all these steps can be done very quickly in either case.

How do I keep this query out of mysql_slowlog?

SELECT count, item, itemid
FROM items
ORDER BY count DESC
LIMIT 20
Takes .0011
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE items index NULL count 4 NULL 20
I have indexes on itemid(primary key) and count (INDEX)
Does anyone have suggestions for how this could be better accomplished?
It seems like your long_query_time variable/setting is extremely short. The default is 10 seconds, but if your query is taking 0.0011 seconds, it obviously shouldn't be logged with the default setting. Try increasing it to something reasonable for your setup (1 second+ probably) and see if this still happens.