MySQL Indices and Order By Clause

MySQL Indices and Order By Clause - mysql

Say I have a simple query like this:
SELECT * FROM topics ORDER BY last_post_id DESC
In accordance to MySQL's Order-By Optimization Docs, an index on last_post_id should be utilized. But EXPLAINing the query tells the contrary:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE topic_info ALL NULL NULL NULL NULL 13 Using filesort
Why is my index not being utilized?

Do you really only have 13 rows? The database may be deciding that simply sorting them is quicker than going via the index.

A basic principle of index optimization is to use representative data for testing. A few rows in a table has no predictive value about how either the index or the optimizer will work in real life.
If you really have only a few records, then the indexing will provide no effective benefit.

EXPLAIN only tells you about the selection process, in which case it's expected that the query would have to examine all 13 rows to see if they meet the WHERE clause (which you don't have, so it's useless information!). It only reports the indexes and keys used for this purpose (evaluating WHERE, JOIN, HAVING). So, regardless of whether the query uses an index to sort or not, EXPLAIN won't tell you about it, so don't get caught up in its results.
Yes, the query uses the index to sort quickly. I've noticed the same results from EXPLAIN (aka reporting all rows despite being sorted by an index and having a limit) and I doubt it's a result of the small number of rows in your table, but rather a limitation of the power of EXPLAIN.

Try selecting the specific columns in order as they are in the table. MySQL indexes don't hold when the order is changed.

Related

how to optimize this sql when use order by select

select id, col1,col2,col3,seq
from `table`
order by seq asc
i have already created index on 'seq', but i found that it doesn't use the index and use filesort when selecting , because the col1 may save some large data ,so i don't want to create the covering index in this table, so it is have some solutions to optimize this sql or table or index, thanks ,my English is not good😂😂😂

The SQL query optimizer apparently estimated the cost of using the index and concluded that it would be better to just do a table-scan and use a filesort of the result.
There is overhead to using a non-covering index. It reads the index in sorted order, but then has to dereference the primary key to get the other columns not covered by the row.
By analogy, this is like reading a book by looking up every word in alphabetical order in the index at the back, and then flipping back to the respect pages of the book, one word at a time. Time-consuming, but it's one way of reading the book in order by keyword.
That said, a filesort has overhead too. The query engine has to collect matching rows, and sort them manually, potentially using temporary files. This is expensive if the number of rows is large. You haven't described the size of your table in number of rows.
If the table you are testing has a small number of rows, the optimizer might have reasoned that it would be quick enough to do the filesort, so it would be unnecessary to read the rows by the index.
Testing with a larger table might give you different results from the optimizer.
The query optimizer makes the right choice in the majority of cases. But it's not infallible. If you think forcing it to use the index is better in this case, you can use the FORCE INDEX hint to make it believe that a table-scan is prohibitively expensive. Then if the index is usable at all, it'll prefer the index.
select id, col1,col2,col3,seq
from `table` FORCE INDEX(seq)
order by seq asc

MySQL date range idex being ignored when a second table added to query

Long time lurker, first time questioner. ;-)
Using PHP 5.6 and MySQL Ver 14.14 Distrib 5.6.41, for Win64 (x86_64) Yeah, I know a little behind the times and we're working on updating. But that's where we are now. ;-)
Updates for questions asked:
The index is on the CreateDate. I thought there might be an issue with that column being a DateTime so I created another column that was just a date, set an index on that and retried, but it didn't have any effect.
ulc has 8965 rows total. With index searches 3787
et has 9530 rows. In the query that doesn't use the index, it searches just one row as it's searching on the primary key from the first query.
The formatting of the comparison date doesn't seem to matter. I've tried all sorts of formats, including just straight "2018-01-01 {00:00:00}'. No change.
I've got what I consider a weird one, but I suspect for someone here it's going to be a "duh!" one. I've got a query that includes a date range for the primary table and then goes to get other bits of data from other tables based on a set of unique ids from the first table. Don't worry, I'll have examples below. When I do the search on just the primary table, the range index works as expected and only searches the relevant rows. However, when I add in the next table with the ON clause, it ignores the index and searches all of the rows of the primary table. If I leave off the on clause, it goes back to using the index correctly. I tried using the FORCE INDEX (USE is ignored) and while that makes it use the index, it slows the query way down. Anyway, here are the queries:
Works:
select CreateDate
from ulc
Inner Join et
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ulc range index_CreateDate index_CreateDate 5 NULL 3787 Using where; Using index
1 SIMPLE et index NULL index_BankProcessorProfile 5 NULL 9530 Using index; Using join buffer (Block Nested Loop)
Doesn't work:
select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ulc ALL TranID,index_CreateDate NULL NULL NULL 8965 Using where
1 SIMPLE et eq_ref PRIMARY PRIMARY 8 showpro.ulc.TranID 1 Using index
For the second one I just added the on et.TranID = ulc.TranID
Additionally, if I change it from a range to a specific date, the index works as well.

Just guessing here without more data,but adding a new table to the JOIN changes the data distribution.
So if in the first case the WHERE condition return probably a small(relatively) percentage of the data,in you second case the optimizer decides you`ll get faster results without using the index since the same conditions might not be quite so selective for the new batch of data.
Add the table definitions and a COUNT for both queries,both total and based on your queries,for a better answer.

if you are using DateTime in your query its suggested to use "YYYY-MM-DD HH:MM:SS" in where class
if you are using Date in your query its suggested to use the format "YYYY-MM-DD" in your where class.you have used STR_TO_DATE("01/01/2018", "%m/%d/%Y") which will typecast to '2018-01-01' seems to be fine
you try to find the complexity of the query using EXPLAIN
explain select CreateDate
from ulc
Inner Join et on et.TranID = ulc.TranID
WHERE ulc.CreateDate >= STR_TO_DATE("01/01/2018", "%m/%d/%Y")
AND ulc.CreateDate <= STR_TO_DATE("08/02/2018", "%m/%d/%Y")
you can check if et.TranID and ulc.TranID have proper index or not

(I'm going to have to guess at some things, since you have not provided SHOW CREATE TABLE. As a 'long time lurker', you should have realized this.)
First guess is that TranID is not the PRIMARY KEY of ulc?
The solution is to add a "composite" INDEX(CreateDate, TranID) to ulc. (Actually, you should replace the existing INDEX(CreateDate) (Second guess is that you have that index now.)
Now I will try to explain why the first query was happy with INDEX(CreateDate) but the second was not.
In the first query, INDEX(CreateDate) is a "covering" index. That is, this index contains all the columns of ulc that are needed by the SELECT. So, it is almost guaranteed that using the index would be better than scanning the table. It will be a "range index scan" of that index.
The second query needs both CreateDate and TranID, so your index won't be "covering". There are two ways to perform the first part of the query. But first, note that (in InnoDB) a secondary index has all the columns of the PRIMARY KEY (third guess: it is (id)).
Range scan of the index. But, in order to get TranID, it first gets id, then does a lookup in the PRIMARY KEY/data to get TranID`. This process is more costly than simply staying in the index, so the Optimizer does not want to do it unless the estimated number of rows is 'small'.
Since 3787/8965 is not "small", the Optimizer decides that it is probably faster to scan ALL 8965 rows, filtering out the ones not needed.
My proposed index is 'covering', thereby avoiding the bounding back and forth between index and data. So, a range index scan is efficient.
Your observation that switching to a single date made use of the index -- Well, 1 row out of 8965 is 'small', so the index (and the bouncing) is deemed to be the faster way.
As for formatting of the date -- True, it does not matter. This is because the parser notices that STR_TO_DATE("01/01/2018", "%m/%d/%Y") is a constant that can be evaluated once, and does so.
My cookbook should take you directly to the composite index without having to scratch your head over this Question.
Your first query is a "cross join" since it has no ON clause to relate the tables together, and it will return about 35 million rows (9530*3787). The second query will have about 3787 rows, maybe fewer (if some of the joins fail to find a match).
"how little changes between the two queries" -- Never think that! The Optimizer will latch on to seemingly insignificant differences. SELECT CreateDate versus SELECT * -- a huge difference. Most of what I said about the 'first query' would be thrown out. Even changing to SELECT ChangeDate, x would be enough to make a big wrinkle. If the datatypes of TranID in the two tables differed enough, the indexes become useless. Etc, etc.

MySQL join query not using Indexes?

I have this query:
SELECT
COUNT(*) AS `numrows`
FROM (`tbl_A`)
JOIN `tbl_B` ON `tbl_A`.`B_id` = `tbl_B`.`id`
WHERE
`tbl_B`.`boolean_value` <> 1;
I added three indexes for tbl_A.B_id, tbl_B.id and tbl_B.boolean_value but mysql still says it doesn't use indexes (in queries not using indexes log) and it examine whole of tables to retrieve the result.
I need to know what I should do to optimize this query.
EDIT:
Explain output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tbl_B ALL PRIMARY,boolean_value NULL NULL NULL 5049 Using where
1 SIMPLE tbl_A ref B_id B_id 9 tbl_B.id 9 Using where; Using index

The explain show us that an index is used to make the join to tbl_B but no index is used to filter tbl_A on the boolean value.
An index was available but the engine choose not to use it. Why it happen:
maybe 5049 rows is not a big deal and the engine saw that using the index to filter something like 10% of the rows using the index would be as fast as doing it without using it.
Booleans takes only 3 values: 1, 0 or NULL. So the cardinality of the index will always be very low (3 max). Low cardinality index are usually dropped by the query analyser (which is quite right usually thinking this index won't help him a lot)
It would be interesting to see if the query analyser behaves the same way when you have a 50/50 repartition of true and false value for this boolean, or when you have just a few False.
Now usually boolean fields are useful only on indexes containing multiple keys, so that if your queries use all the fields of the index (in where or order by) the query analyser will trust that index to be really a good tool.
Note that indexes are slowing down your writes and takes extra-spaces, do not add useless indexes. Using logt-query-not-using-indexes is a good thing, but you should compensate that log information with the slow queries log.If the query is fast it's not a problem.

if boolean_value it's really boolean value indexing of it not so good idea. Index wouldn't be effective.

Removing 'using filesort' from query

I have the following query:
SELECT *
FROM shop_user_member_spots
WHERE delete_flag = 0
ORDER BY date_spotted desc
LIMIT 10
When run, it takes a few minutes. The table is around 2.5 million rows.
Here is the table (not designed by me but I am able to make some changes):
And finally, here are the indexes (again, not made by me!):
I've been attempting to get this query running fast for hours now, to no avail.
Here is the output of EXPLAIN:
Any help / articles are much appreciated.

Based on your query, it seems the index you would want would be on (delete_flag, date_spotted). You have an index that has the two columns, but the id column is in between them, which would make the index unhelpful in sorting based on date_spotted. Now whether mysql will use the index based on Zohaib's answer I can't say (sorry, I work most often with SQL Server).

The problem that I see in the explain plan is that the index on spotted date is not being used, insted filesort mechanism is being used to sort (as far as index on delete flag is concerned, we actually gain performance benefit of index if the column on which index is being created contains unique values)
the mysql documentation says
Index will not used for order by clause if
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
I guess same is the case here. Although you can try using Force Index

what index(es) needs to be added for this query to work properly?

This query pops up in my slow query logs:
SELECT
COUNT(*) AS ordersCount,
SUM(ItemsPrice + COALESCE(extrasPrice, 0.0)) AS totalValue,
SUM(ItemsPrice) AS totalValue,
SUM(std_delivery_charge) AS totalStdDeliveryCharge,
SUM(extra_delivery_charge) AS totalExtraDeliveryCharge,
this_.type AS y5_,
this_.transmissionMethod AS y6_,
this_.extra_delivery AS y7_
FROM orders this_
WHERE this_.deliveryDate BETWEEN '2010-01-01 00:00:00' AND '2010-09-01 00:00:00'
AND this_.status IN(1, 3, 2, 10, 4, 5, 11)
AND this_.senderShop_id = 10017
GROUP BY this_.type, this_.transmissionMethod, this_.extra_delivery
ORDER BY this_.deliveryDate DESC;
The table is InnoDB and has about 880k rows and takes between 9-12 seconds to execute. I tried adding the following index ALTER TABLE orders ADD INDEX _deliverydate_senderShopId_status ( deliveryDate , senderShop_id , status, type, transmissionMethod, extra_delivery); with no practical gains. Any help and/or suggestion is welcomed
Here is the query execution plan right now:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE this_ ref FKC3DF62E57562BA6F 8 const 139894 100.00 Using where; Using temporary; Using filesort
I took out the possible_keys value out of the text because i think it listed all the indexes in the table. The key used (FKC3DF62E57562BA6F) looks like
Keyname Type Unique Packed Field Cardinality Collation Null Comment
FKC3DF62E57562BA6F BTREE No No senderShop_id 4671 A

I'll tell you one thing that you can look at for increasing the speed.
You only generally have NULL values in the data for either unknown or non-applicable rows. It appears to me that, since you're treating NULL as 0 anyway, you should think about getting rid of them and making sure that all extrasPrice values are 0 where they were previously NULL so that you can get rid of the time penalty of the coalesce.
In fact, you could go one step further and introduce another column called totalPrice which you set with an insert/update trigger to the actual value ItemsPrice + extrasPrice or (ItemsPrice + COALESCE(extrasPrice,0.0) if you still need nullability of extrasPrice).
Then, you can simply use:
SELECT
COUNT(*) AS ordersCount,
SUM(totalPrice) AS totalValue,
SUM(ItemsPrice) AS totalValue2,
:
(I'm not sure you should have two output columns with the same name or whether that was a typo, that's going to be, at worst, an error, at best, confusing).
This moves the cost of the calculation to insert/update time rather than select time and amortises that cost over all the selects - most database tables are read far more often than written. The consistency of the data is maintained due to the trigger and the performance should be better, at the cost of some storage requirements.
But, since the vast majority of database questions are "How can I get more speed?" rather than "How can I use less disk?", that's often a good idea.
Another suggestion is to provide a non-composite index on the column that reduces your result set the fastest (high cardinality). In other words, if you store only two weeks worth of data (14 different dates) in your table but 400 different shops, you should have an index on senderShop_id and make sure your statistics are up to date.
This should cause the DBMS execution engine to whittle down the result set using that key so that subsequent operations are faster.
A composite index on deliveryDate,senderShop_id,... will not be able to use senderShop_id to whittle down the results because the key ordering will be senderShop_id within deliveryDate.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008