I got a spicy question about mysql...
The idea here is to select the n last records from a table, filtering by a property, (possibly from another table). That simple.
At this point you wanna reply :
let n = 10
SELECT *
FROM huge_table
JOIN another_table
ON another_table.id = huge_table.another_table_id
AND another_table.some_interesting_property
ORDER BY huge_table.id DESC
LIMIT 10
Without the JOIN that's OK, mysql reads the index from the end and trow me 10 items, execution time is negligible
With the join, the execution time become dependent of the size of the table and in many case not negligible, the explain stating that mysql is : "Using where; Using index; Using temporary; Using filesort"
MySQL documentation (http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html) states that :
"You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)"
explaining why MySQL can't use index to resolve my ORDER BY prefering a huge file sort ...
My question is : Is it natural to use ORDER BY ... LIMIT 10 to get last items ? Do you really do it while picking last 10 cards in an ascending ordered card deck ? Personally i just pick 10 from the bottom ...
I tried many possibilities but all ended giving the conclusion that i'ts really fast to query 10 first elements and slow to query 10 last cause of the ORDER BY clause.
Can a "Select last 10" really be fast ? Where i am wrong ?
Nice question, I think you should make order by column i.e., id a DESC index.
That should do the trick.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
With the join you're now restricting rows to "some_interesting_property" and the ID's in your huge_table may no longer be consecutive... Try an index on another_table (some_interesting_property, id) and also huge_table (another_table_id, id) and see if your EXPLAIN gives you better hints.
I'm having trouble reproducing your situation. Whether I use ASC or DESC with my huge_table/another_table mock up, my EXPLAINs and execution time all show approx N rows read and a logical join. Which version of MySQL are you using?
Also, from the EXPLAIN doc, it states that Using index indicates
The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row
which doesn't correspond with the fact you're doing a SELECT *, unless you have an index which covers your whole table.
Perhaps you should show your schema, including indexes, and the EXPLAIN output.
Related
Consider a table Test having 1000 rows
Test Table
id name desc
1 Adi test1
2 Sam test2
3 Kal test3
.
.
1000 Jil test1000
If i need to fetch, say suppose 100 rows(i.e. a small subset) only, then I am using LIMIT clause in my query
SELECT * FROM test LIMIT 100;
This query first fetches 1000 rows and then returns 100 out of it.
Can this be optimised, such that the DB engine queries only 100 rows and returns them
(instead of fetching all 1000 rows first and then returning 100)
Reason for above supposition is that the order of processing will be
FROM
WHERE
SELECT
ORDER BY
LIMIT
You can combine LIMIT ROW COUNT with an ORDER BY, This causes MySQL to stop sorting as soon as it has found the first ROW COUNT rows of the sorted result.
Hope this helps, If you need any clarification just drop a comment.
The query you wrote will fetch only 100 rows, not 1000. But, if you change that query in any way, my statement may be wrong.
GROUP BY and ORDER BY are likely to incur a sort, which is arguably even slower than a full table scan. And that sort must be done before seeing the LIMIT.
Well, not always...
SELECT ... FROM t ORDER BY x LIMIT 100;
together with INDEX(x) -- This may use the index and fetch only 100 rows from the index. BUT... then it has to reach into the data 100 times to find the other columns that you ask for. UNLESS you only ask for x.
Etc, etc.
And here's another wrinkle. A lot of questions on this forum are "Why isn't MySQL using my index?" Back to your query. If there are "only" 1000 rows in your table, my example with the ORDER BY x won't use the index because it is faster to simply read through the table, tossing 90% of the rows. On the other hand, if there were 9999 rows, then it would use the index. (The transition is somewhere around 20%, but it that is imprecise.)
Confused? Fine. Let's discuss one query at a time. I can [probably] discuss the what and why of each one you throw at me. Be sure to include SHOW CREATE TABLE, the full query, and EXPLAIN SELECT... That way, I can explain what EXPLAIN tells you (or does not).
Did you know that having both a GROUP BY and ORDER BY may cause the use of two sorts? EXPLAIN won't point that out. And sometimes there is a simple trick to get rid of one of the sorts.
There are a lot of tricks up MySQL's sleeve.
Given the following two queries:
Query #1
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623, 204072)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Query #2 - 4 IDs instead 5
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Explain Plan
-- Query #1
1 SIMPLE log range idx_user_id_and_log_id idx_user_id_and_log_id 4 41280 Using index condition; Using where; Using filesort
-- Query #2
1 SIMPLE log index idx_user_id_and_log_id PRIMARY 4 53534 Using where
Why the addition of a single ID makes the execution plan so different? I'm talking about a difference in time of milliseconds to ~1 minute. I thought that it could be related to the eq_range_index_dive_limit parameters, but it's bellow 10 anyway (the default). I know that I can force the usage of the index instead of the clustered index, but I wanted to know why MySQL decided that.
Should I try to understand that? Or sometimes it's not possible to understand query planner decisions?
Extra Details
Table Size: 11GB
Rows: 108 Million
MySQL: 5.6.7
Doesn't matter which ID is removed from the IN clause.
The index: idx_user_id_and_log_id(user_id, id)
As you have shown, MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:
Read all qualifying rows, sort them, and pick the n top rows.
Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, one probably has to read a lot more of the table in sorted order in order to find the first 25 qualifying rows than what the optimizer expected.
There have been several improvements in how LIMIT queries are handled, both in later releases of 5.6 (you are running on a pre-GA release!), and in newer releases (5.7, 8.0). I suggest you try to upgrade to a later release, and see if this still is an issue.
In general, if you want to understand query planner decisions, you should look at the optimizer trace for the query.
JOIN is much more efficient.
Create a temporary table with the values of the IN operator.
Then make a JOIN between table 'log' to the temporary table of values.
Refer to this answer
for more info.
Add
INDEX(user_id, type, id),
INDEX(type, user_id, id)
Each of these is a "covering" index. As such, the entire query can be performed by looking only in one index, without touching the 'data'.
I have two choices for the Optimizer -- hopefully it will be able to pick whether user_id IN (...) is more selective or type IN (...) in order to pick the better index.
If, after adding those, you don't have any use for idx_user_id_and_log_id(user_id, id), DROP it.
(No, I can't explain why query 2 chose to do a table scan.)
I have two tables
LangArticles | columns: id (INT) ,de (VARCHAR),en (VARCHAR),count_links(INT)
WikiLinks | columns: article_id,link_id,nr_in_article (all integer)
The name of an article is in the columns de (German) and en (English).
The id in the LangArticles table is the same as the ids article_id and link_id.
I want now to get all article names which links to another article. So I want all articles which links to 'abc'. 'abc' has the id = '1'
So my normal query (without an order by) looks like:
select distinct(LA.de),W.nr_in_article,LA.count_links from
LangArticles as LA inner join WikiLinks as W on W.article_id = LA.id
where W.link_id in ("1")
This maybe took 0.001 seconds and give me 100000 results. Actually I want the best 5 hits.
Best means in this case the most relevant ones. I want to sort it like this:
The articles which links to 'abc' at the beginning of an article (nr_in_article) and which has a lot of links itself (count_links) should have a high ranking.
I am using an
order by (1-(W.nr_in_article/LA.count_links)) desc
for this.
The problem is that I am not sure how to optimize this order by.
The Explain in mysql says that he has to use a temporary file and filesort and can't use the index on the order by keys. For testing I tried an "easy" order by W.nr_in_article so an normal order with one key.
For your information my indices are:
in LangArticles: id (primary),de (unique),en (unique), count_links(index)
in WikiLinks: article_id(index),link_id(index),nr_in_article(index)
But I tried this two multiindices link_id,nr_in_article & article_id,nr_in_article as well.
And the query with order by tooks approximately 5.5 seconds. :(
I think I know why MySql has to use a temporary file and filesort here because all 100,000 entries has to be found with one index (link_id) and afterwards it has to be sorted and in a temporary file it can't use an index.
But is there any way to make this faster?
Actually I only want the best 5 hits so there is no need to sort everything. I am not sure if sth. like the bad sort (bubble sort) would be faster for this than Quicksort which sorts the hole temporary table.
Since you only need the top 5 I think you could split it into two queries that should lead less results.
First like Sam pointed out,
order by (W.nr_in_article/LA.count_links) asc
should be equivalent to your
order by (1-(W.nr_in_article/LA.count_links)) desc
unless I'm overlooking some corner case here.
Furthermore, anything where
W.nr_in_article > LA.count_links
will be in the TOP 5 unless that result is empty, so I would try the query
select distinct(LA.de),W.nr_in_article,LA.count_links
from LangArticles as LA
inner join WikiLinks_2 as W on W.article_id = LA.id
and W.nr_in_article > LA.count_links
where W.link_id in ("1")
order by W.nr_in_article/La.count_links
limit 5
Only if this returns less than 5 results you have to further execute the query again with a changed where condition.
This however will not bring the runtime down by orders of magnitude, but should help a little. If you need more performance I don't see any other way than a materialized view, which I don't think is available in mysql, but can be simulated using triggers.
I have the following query:
SELECT *
FROM shop_user_member_spots
WHERE delete_flag = 0
ORDER BY date_spotted desc
LIMIT 10
When run, it takes a few minutes. The table is around 2.5 million rows.
Here is the table (not designed by me but I am able to make some changes):
And finally, here are the indexes (again, not made by me!):
I've been attempting to get this query running fast for hours now, to no avail.
Here is the output of EXPLAIN:
Any help / articles are much appreciated.
Based on your query, it seems the index you would want would be on (delete_flag, date_spotted). You have an index that has the two columns, but the id column is in between them, which would make the index unhelpful in sorting based on date_spotted. Now whether mysql will use the index based on Zohaib's answer I can't say (sorry, I work most often with SQL Server).
The problem that I see in the explain plan is that the index on spotted date is not being used, insted filesort mechanism is being used to sort (as far as index on delete flag is concerned, we actually gain performance benefit of index if the column on which index is being created contains unique values)
the mysql documentation says
Index will not used for order by clause if
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
I guess same is the case here. Although you can try using Force Index
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.