I have mysql db with 7 Million records
when I run query like
select * from data where cat_id=12 order by id desc limit 0,30
the query take long time like 0.4603 sec
but same query with out (where cat_id=12) or with out (order by id desc) very Fast
the query take long time like 0.0002 sec
I have indexes on cat_id and id
there is any way to make query with (where and order by) fast
thanks
Create a composite index that combines cat_id and id. See http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html for syntax and examples.
If you state 'cat_id=12' only, you will get all matching rows, which is fast, because of the index. But these rows won't be ordererd, so mysql has to read them all into a temporary table and sort that table, which is slow.
Similarly, 'order by id desc' will order the rows quickly, but mysql has to read all of them to find out which have 'cat_id=12', which is slow.
A composite index should solve these issues.
It is running fast without order by since when you write order by DESC then it first iterates through all the rows and then it selects in descending order. Removing the condition makes it by default ASCENDING which makes it fast.
Also it may be that your index is sorted ascending so when you ask for descending it needs to do a lot more work to bring it back in that order
Related
1. select * from inv_inventory_change limit 1000000,10
2. select id from inv_inventory_change limit 1000000,10
The first sql' timeconsumption is about 1.6s, the second sql timeconsumption is about 0.37s;
So the 2nd sql and 1st sql timeconsumption differential is about 1.27s;
I understand msyql will use covering index when only query indexed column, that is why 'select id' is faster;
However, when i use in idlist sql below to execute, i found it only took about 0.2s which is much shorter than the differential(1.27s), which is confusing me;
select * from inv_inventory_change c where c.id in (1013712,1013713,1013714,1013715,1013716,1013717,1013718,1013719,1013720,1013721);
My key question is why the time differential is much bigger than the where id in sql;
The inv_inventory_change table has 2321211 records;
And i add 'order by id asc' on above sqls, the timeconsumption not change;
EXPLAIN
The rule is very simple; your first query can be served without reading data from the disk/memory cache.
select id from inv_inventory_change limit 1000000,10
This can be directly served from the index table (B-Tree or its variant) without reading page information and other meta information.
select * from inv_inventory_change limit 1000000,10
This query will require two steps to fetch records. First, it will perform a query on the index table, which would be quick, but next, it needs to read page information for those records that will require disk io and storing in cache, etc. Since a LIMIT is applied, it will automatically sort for you depending on the default ORDER BY setting, most likely it will sort using the id field. Since you're selecting a large number of records it will use FileSort to sort records or something similar.
select * from inv_inventory_change c where c.id in (1013712,1013713,1013714,1013715,1013716,1013717,1013718,1013719,1013720,1013721);
This query would be served using a range scan on the index table and it can find the entry corresponding to 1013712 in O(lon N) time and it should be able the serve the query quickly.
You should also look at the number of records you're reading, e.g the query having limit 1000000,10 will require many disk io due to a large number of entries whereas in the 3rd example it will read a handful number of pages.
i have this query which is very simple but i dont want to use index here due to some constraints.
so my worry is how to avoid huge load on server if we are calling non indexed item in where clause.
the solution i feel will be limit.
i am sure of having data in 1000 rows so if i use limit i can check the available values.
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
here student column is not indexed in mysql so my worry is it will cause huge load in server
i tried with explain and it seems to be ok but problem is less no of rows in table and as u know mysql goes crazy with more data like millions of rows.
so what are my options ??
i should add index for student ??
if i will add index then i dont need 1000 rows in limit. one row is sufficient and as i said table is going to be several millions of rows so it requires lot of space so i was thinking to avoid indexing of student column and other query is 1000 row with desc row should not cause load on server as id is indexed.
any help will be great
You say:
but i dont want to use index here due to some constraints...
and also say:
how to avoid huge load on server...
If you don't use an index, you'll produce "huge load" on the server. If you want this query to be less resource intensive, you need to add an index. For the aforementioned query the ideal index is:
create index on tableA (student, status, id);
This index should make your query very fast, even with millions of rows.
LIMIT 100 doesn't force the database to search in the first 100 rows.
It just stop searching after 100 matches are found.
So it is not used for performance.
In the query below
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
The query will run until it finds 1000 matches.
It doesn't have to search only the first 1000 rows
So this is the behaviour of the above query:
int nb_rows_matched = 0;
while (nb_rows_matched < 1000){
search_for_match();
}
Lots of thread already on web, just trying to understand some nuances which had me confused!
Quoting the doc reference
If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as
soon as it has found the first row_count rows of the sorted result,
rather than sorting the entire result. If ordering is done by using an
index, this is very fast.
and a SO thread
It will order first, then get the first 20. A database will also
process anything in the WHERE clause before ORDER BY.
Taking the same query from the question :
SELECT article
FROM table1
ORDER BY publish_date
LIMIT 20
lets say table has 2000 rows, of which query is expected to return 20 rows, now, looking at mysql ref ....stops sorting as soon as it has found the first row_count rows.... confuses me as i find it little ambiguous!!
Why does it say stops sorting? isn't the limit clause being applied on an already sorted data returned via order by clause ( assuming its a non-indexed column ) or is my understanding wrong and SQL is limiting first and then sorting!!??
The optimization mentioned in the documentation generally only works if there's an index on the publish_date column. The values are stored in the index in order, so the engine simply iterates through the index of the column, fetching the associated rows, until it has fetched 20 rows.
If the column isn't indexed, the engine will generally need to fetch all the rows, sort them, and then return the first 20 of these.
It's also useful to understand how this interacts with WHERE conditions. Suppose the query is:
SELECT article
FROM table1
WHERE last_read_date > '2018-11-01'
ORDER BY publish_date
LIMIT 20
If publish_date is indexed and last_read_date is not, it will scan the publish_date index in order, test the associated last_read_date against the condition, and add article to the result set if the test succeeds. When there are 20 rows in the result set it will stop and return it.
If last_read_date is indexed and publish_date is not, it will use the last_read_date index to find the subset of all the rows that meet the condition. It will then sort these rows using the publish_date column, and return the first 20 rows from that.
If neither column is indexed it will do a full table scan to test last_read_date, sort all the rows that match the condition, and return the first 20 rows of this.
MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result
This is actually a very sensible optimisation within mysql. If you use limit to return 20 rows and mysql knows it already found them, then why would mysql (or you) care how exactly the rest of the records are sorted? It does not matter, therefore mysql stops sorting the rest of the rows.
If the order by is done on an indexed column, then mysql can tell pretty quickly, if it found the top n records.
I'm finding the following a little perplexing... if I perform the below queries, when sorting by the indexed value 'keyword' it takes 0.0008 seconds, but when sorting by 'count' it takes over 3 seconds.
The following takes approx 0.0008 seconds:
SELECT keyword, COUNT(DISTINCT pmid) as count
FROM keywords
WHERE (collection_id = 13262022107433)
GROUP BY keyword
order by keyword desc limit 1;
This takes over 3 seconds:
SELECT keyword, COUNT(DISTINCT pmid) as count
FROM keywords
WHERE (collection_id = 13262022107433)
GROUP BY keyword
order by count desc limit 1;
Is there a way of speeding up a sort on a result set when sorting by count? Should it really take that much longer? Are there any alternatives? The engine is InnoDB.
Many thanks for your input!
You may want to add an additional index to assist the in the counting phase.
ALTER TABLE keywords ADD INDEX ckp_index (collection_id,keyword,pmid);
If you already have a compound index with collection_id and keyword only, the Query Optimizer will still include a lookup for the pmid field from the table.
By adding this new index, this will remove any table scans and perform index scans only.
This will speed the count(distinct pmid) portion of the query.
Give it a Try !!!
Not unexpected, not avoidable. When this query is ordered by keyword, MySQL can just look at what keyword comes last, pick out the rows with that keyword, and count them. When you order by count, though, it has to count the rows for every keyword to figure out which one is highest. That's a lot more work!
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.