select id, col1,col2,col3,seq
from `table`
order by seq asc
i have already created index on 'seq', but i found that it doesn't use the index and use filesort when selecting , because the col1 may save some large data ,so i don't want to create the covering index in this table, so it is have some solutions to optimize this sql or table or index, thanks ,my English is not good😂😂😂
The SQL query optimizer apparently estimated the cost of using the index and concluded that it would be better to just do a table-scan and use a filesort of the result.
There is overhead to using a non-covering index. It reads the index in sorted order, but then has to dereference the primary key to get the other columns not covered by the row.
By analogy, this is like reading a book by looking up every word in alphabetical order in the index at the back, and then flipping back to the respect pages of the book, one word at a time. Time-consuming, but it's one way of reading the book in order by keyword.
That said, a filesort has overhead too. The query engine has to collect matching rows, and sort them manually, potentially using temporary files. This is expensive if the number of rows is large. You haven't described the size of your table in number of rows.
If the table you are testing has a small number of rows, the optimizer might have reasoned that it would be quick enough to do the filesort, so it would be unnecessary to read the rows by the index.
Testing with a larger table might give you different results from the optimizer.
The query optimizer makes the right choice in the majority of cases. But it's not infallible. If you think forcing it to use the index is better in this case, you can use the FORCE INDEX hint to make it believe that a table-scan is prohibitively expensive. Then if the index is usable at all, it'll prefer the index.
select id, col1,col2,col3,seq
from `table` FORCE INDEX(seq)
order by seq asc
Related
Having a 10+ million table with three columns: one, two, three and SQL query like SELECT * FROM table ORDER BY one, two, three LIMIT 1 - do I really need to create a multi-column index using all three columns?
I know for sure that if one and two matches, there would be max 10 rows with distinct three.
Is it enough for fast SELECTs? -
CREATE INDEX MY_INDEX ON table (one, two);
With INDEX(one, two, three), the query will go straight down the BTree to the one (LIMIT 1) desired row.
With INDEX(one, two), the query will go straight down the BTree to the first such row, then scan forward the up-to-10 rows, save them to a tmp table, sort them (ORDER BY includes three) (probably done in memory), and deliver the first one. Although this sounds more complex it will not (in this example) be much slower.
It will not be a "table scan" ("ALL"), but perhaps a "range" scan. Use EXPLAIN SELECT ... to see.
If three is a bulky string, then the 3-col index will be bulkier; this has some impact on disk space and performance.
If you need only (one, two) for some other queries, then either index works reasonably well (barring the "bulky" comment).
If you do SELECT one, two, three FROM ..., the 3-part index will be better because it will be "covering". SELECT * won't have such a bonus.
Bottom line: Either index is "OK", many other factors factor in, making it hard to say for sure what to do.
You might think MySQL is clever enough to only read at most the first 10 rows using the index and then sort these. Unfortunately, it isn't (because the optimizer doesn't regard the limit at this point). You can verify that by using explain select ..., it will show that MySQL will do a full table scan ("ALL").
The documentation describes conditions to be able to use an index to optimize order by:
The index can also be used even if the ORDER BY does not match the index exactly, as long as all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause.
Your third column does not satisfy this. So this query will not use this index (which does not mean that it might not be usefull for something else).
Since MySQL 5.6, there is however the so-called filesort priority queue optimization to accommodate the limit: while MySQL will still read the whole table, it will not sort the whole table (which would be a time consuming process), but will stop when it knows what the first row will be, which makes your query acceptable fast.
But you can rewrite your query to do exactly what you are thinking of:
SELECT * FROM
(select * from table ORDER BY one, two LIMIT 10) sub
order by one, two, three limit 1;
This will read the first 10 rows using that index, and then just sort these. It will of course only work correctly if you are absolutely sure you will only have at most 10 rows.
A more general way to optimize your query independently from knowing the maximum number of possible rows would be e.g.
SELECT * FROM table
where one = (select min(one) from table)
order by one, two, three limit 1;
This will use the index to reduce the number of rows that have to be read and filesorted by looking up the lowest value for one first (using the index) and only considering these rows. You can similarly include a condition for two.
Or you can simply can use all three columns in your index (although depending on the size of your third column, it can make sense to not do this). These kind of optimizations tend to catch up at one point. If you e.g. use the first method, and in 2 year there will be 11 rows possible, you (or your successor) will have to remember that you have this implied condition in your code.
select tblfarmerdetails.ncode,
tblfarmerdetails.region,tblfarmerdetails.province, tblfarmerdetails.municipality,
concat(tblfarmerdetails.farmerfname, ' ', tblfarmerdetails.farmerlname) as nameoffarmer,
concat(tblfarmerdetails.spousefname, ' ',tblfarmerdetails.spouselname) as nameofspouse, tblstatus.statusoffarmer from tblfarmerdetails
INNER Join
tblstatus on tblstatus.ncode = tblfarmerdetails.ncode where tblstatus.ncode = tblfarmerdetails.ncode order by tblfarmerdetails.region
It takes too long to retrieve 11.2m data. How will I improve this query?
Firstly, format the query so it is readable, or at least decipherable, by a human.
SELECT f.ncode
, f.region
, f.province
, f.municipality
, CONCAT(f.farmerfname,' ',f.farmerlname) AS nameoffarmer
, CONCAT(f.spousefname,' ',f.spouselname) AS nameofspouse
, s.statusoffarmer
FROM tblfarmerdetails
JOIN tblstatus s
ON s.ncode = f.ncode
ORDER BY f.region
It's likely that a lot of time is spent to do a "Using filesort" operation, to sort all the rows in the order specified in the ORDER BY clause. For sure a sort operation is going to occur if there's not an index with a leading column of region.
Having a suitable index available, for examaple
... ON tblfarmerdetails (region, ... )
Means that MySQL may be able to return the rows "in order", using the index, without having to do a sort operation.
If MySQL has a "covering index" available, i.e. an index that contains all of the columns of the table reference in the query, MySQL can make use of that index to satisfy the query without needing to visit pages in the underlying table.
But given the number of columns, and the potential that some of these columns may be goodly sized VARCHAR, this may not be possible or workable:
... ON tblfarmerdetails (region, ncode, province, municipality, farmerfname, farmerlname, spousefname, spouselname)
(MySQL does have some limitations on indexex. The goal of the "covering index" is to avoid lookups to pages in the table.)
And make sure that MySQL knows that ncode is UNIQUE in tblstatus. Either that's the PRIMARY KEY or there's a UNIQUE index.
We suspect tblstatus table contains a small number of rows, so the join operation is probably not that expensive. But an appropriate covering index, with ncode as the leading column, wouldn't hurt:
... ON tblstatus (ncode, statusoffarmer)
If MySQL has to performa a "Using filesort" operation to get the rows ordered (to satisfy the ORDER BY clause), on a large set, that operation can spill to disk, and that can add (sometimes significantly) to the elapsed time.
The resultset produced by the query has to be transferred to the client. And that can also take some clock time.
And the client has to do something with the rows that are returned.
Are you sure you really need to return 11.2M rows? Or, are you only needing the first couple of thousand rows?
Consider adding a LIMIT clause to the query.
And how long are those lname and fname columns? Do you need MySQL to do the concatenation for you, or could that be done on the client as the rows are proceesed.
It's possible that MySQL is having to do a "Using temporary" to hold the rows with the concatenated results. And MySQL is likely allocating enough space for that return column to hold the maximum possible length from lname + maximum posible length from fname. And if that's a multibyte character characterset, that will double or triple the storage over a single byte characterset.
To really see what's going on, you'd need to take a look at the query execution plan. You get that by preceding your SELECT statement with the keyword EXPLAIN
EXPLAIN SELECT ...
The output from that will show the operations that MySQL is going to do, what indexes it's going to use. And armed with knowledge about the operations the MySQL optimizer can perform, we can use that to make some pretty good guesses as to how to get the biggest gains.
This question MAY have been asked before, but I can't for the life of me find the answer.
In order to avoid
SELECT * FROM student WHERE name LIKE '%searchphrase%' ORDER BY score
which, as I understand it, will never use index and will always use filesort there's the ability to use FULLTEXT index.
The question: How can I order by score without a filesort if I perform a fulltext search?
Result rows will come out in whatever order they're in in the FULLTEXT index which certainly isn't the order required by ORDER BY score, so the fulltext matches need to be sorted for ORDER BY in a separate step, and this is what filesort does.
The only alternative execution plan would be to retrieve rows in score order, and then apply fulltext match row by row, which totally defies any fulltext specific optimizations.
What may make sense in your case may be to have a combined index on (score, name) and stick with LIKE if your search expression covers a large part of the student rows, in this case you'd get an index scan in score order and the LIKE expression can be evaluated on index entries. Sou you're getting a full index scan instead of a full table scan, and no extra sort is needed as index entries are ordered by score already.
But if the number of matching rows is rather small compared to the total number of rows in the table doing a fulltext index lookup first, followed by filesort, will be the better plan.
SELECT * FROM messages_messages WHERE (from_user_id=? AND to_user_id=?) OR (from_user_id=? AND to_user_id=?) ORDER BY created_at DESC
I have another query, which is this:
SELECT COUNT(*) FROM messages_messages WHERE from_user_id=? AND to_user_id=? AND read_at IS NULL
I want to index both of these queries, but I don't want to create 2 separate indexes.
Right now, I'm using 2 indexes:
[from_user_id, to_user_id, created_at]
[from_user_id, to_user_id, read_at]
I was wondering if I could do this with one index instead of 2?
These are the only 2 queries I have for this table.
The docs explain fairly completely how MySQL uses indices. In particular, its optimizer can use any left prefix of a multi-column index. Therefore, you could drop either of your two existing indices, and the other would be eligible for use in both queries, though it would be more selective / useful for one than for the other.
In principle, it could be more beneficial to keep your first index, provided that the created_at column was indexed in descending order. In practice, MySQL allows you to specify index column order, but in fact implements only ascending order. Therefore, having created_at in your index probably doesn't help very much.
No, you need both indexes for these two queries if you want to optimize fully.
Once you reach the column used for either sorting or range comparison (IS [NOT] NULL counts as a range predicate for this purpose), you don't get any benefit from putting more columns in the index. In other words, your index can have:
Some columns that are used in equality predicates
One column that is used either in a range predicate, or to avoid a filesort -- but not both.
Extra columns used in neither searching nor sorting, but only for the sake of a covering index.
So you cannot make a four-column index that serves both queries.
The only way you can reduce this to one index, as #JohnBollinger says, is to make an index that optimizes for one query, and uses a subset of the index for the second query. But that won't work as well.
A quick question. I have a simple table with this structure:
USERS_TABLE = {id}, name, number_field_1, number_field_2, number_field_3, number_field_4, number_field_5
Sorting is a major feature, one field at a time at the moment. Table can have up to 50 million records. Most selects are done using the "id" field, which is the primary key and it's indexed.
A sample query:
SELECT * FROM USERS_TABLE WHERE id=10 ORDER BY number_field_1 DESC LIMIT 100;
The question:
Do I create separate indexes for each "number_field_*" to optimize ORDER BY statements, or is there a better way to optimize? Thanks
There's no silver bullet here, you will have to test this for your data and your queries.
If all your queries (e.g. WHERE id=10) returns few rows, it might not be worth indexing the order by columns. On the other hand, if they return a lot of rows, it might help greatly.
If you always order by atleast one of the fields, consider creating an index on that column, or a compound index on some of these columns - keep in mind that if you have a compound index on (num_field_1,num_field_2) and you order only by num_field_2 , that index will not be used. You have to include the leftmost fields of the index to make use of it.
On the other hand, you seem to have a lot of different fields you order by, the drawback of creating an index on each an every one of them is your indexes will be much larger, and your inserts/deletes/updates might start to get slower.
Basically - there is no shortcut. You'll have to play around with which indexes works best for your queries and tune accordingly, while carefully analyzing your queries with EXPLAIN