i have this query which is very simple but i dont want to use index here due to some constraints.
so my worry is how to avoid huge load on server if we are calling non indexed item in where clause.
the solution i feel will be limit.
i am sure of having data in 1000 rows so if i use limit i can check the available values.
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
here student column is not indexed in mysql so my worry is it will cause huge load in server
i tried with explain and it seems to be ok but problem is less no of rows in table and as u know mysql goes crazy with more data like millions of rows.
so what are my options ??
i should add index for student ??
if i will add index then i dont need 1000 rows in limit. one row is sufficient and as i said table is going to be several millions of rows so it requires lot of space so i was thinking to avoid indexing of student column and other query is 1000 row with desc row should not cause load on server as id is indexed.
any help will be great
You say:
but i dont want to use index here due to some constraints...
and also say:
how to avoid huge load on server...
If you don't use an index, you'll produce "huge load" on the server. If you want this query to be less resource intensive, you need to add an index. For the aforementioned query the ideal index is:
create index on tableA (student, status, id);
This index should make your query very fast, even with millions of rows.
LIMIT 100 doesn't force the database to search in the first 100 rows.
It just stop searching after 100 matches are found.
So it is not used for performance.
In the query below
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
The query will run until it finds 1000 matches.
It doesn't have to search only the first 1000 rows
So this is the behaviour of the above query:
int nb_rows_matched = 0;
while (nb_rows_matched < 1000){
search_for_match();
}
Related
Consider a table Test having 1000 rows
Test Table
id name desc
1 Adi test1
2 Sam test2
3 Kal test3
.
.
1000 Jil test1000
If i need to fetch, say suppose 100 rows(i.e. a small subset) only, then I am using LIMIT clause in my query
SELECT * FROM test LIMIT 100;
This query first fetches 1000 rows and then returns 100 out of it.
Can this be optimised, such that the DB engine queries only 100 rows and returns them
(instead of fetching all 1000 rows first and then returning 100)
Reason for above supposition is that the order of processing will be
FROM
WHERE
SELECT
ORDER BY
LIMIT
You can combine LIMIT ROW COUNT with an ORDER BY, This causes MySQL to stop sorting as soon as it has found the first ROW COUNT rows of the sorted result.
Hope this helps, If you need any clarification just drop a comment.
The query you wrote will fetch only 100 rows, not 1000. But, if you change that query in any way, my statement may be wrong.
GROUP BY and ORDER BY are likely to incur a sort, which is arguably even slower than a full table scan. And that sort must be done before seeing the LIMIT.
Well, not always...
SELECT ... FROM t ORDER BY x LIMIT 100;
together with INDEX(x) -- This may use the index and fetch only 100 rows from the index. BUT... then it has to reach into the data 100 times to find the other columns that you ask for. UNLESS you only ask for x.
Etc, etc.
And here's another wrinkle. A lot of questions on this forum are "Why isn't MySQL using my index?" Back to your query. If there are "only" 1000 rows in your table, my example with the ORDER BY x won't use the index because it is faster to simply read through the table, tossing 90% of the rows. On the other hand, if there were 9999 rows, then it would use the index. (The transition is somewhere around 20%, but it that is imprecise.)
Confused? Fine. Let's discuss one query at a time. I can [probably] discuss the what and why of each one you throw at me. Be sure to include SHOW CREATE TABLE, the full query, and EXPLAIN SELECT... That way, I can explain what EXPLAIN tells you (or does not).
Did you know that having both a GROUP BY and ORDER BY may cause the use of two sorts? EXPLAIN won't point that out. And sometimes there is a simple trick to get rid of one of the sorts.
There are a lot of tricks up MySQL's sleeve.
1. select * from inv_inventory_change limit 1000000,10
2. select id from inv_inventory_change limit 1000000,10
The first sql' timeconsumption is about 1.6s, the second sql timeconsumption is about 0.37s;
So the 2nd sql and 1st sql timeconsumption differential is about 1.27s;
I understand msyql will use covering index when only query indexed column, that is why 'select id' is faster;
However, when i use in idlist sql below to execute, i found it only took about 0.2s which is much shorter than the differential(1.27s), which is confusing me;
select * from inv_inventory_change c where c.id in (1013712,1013713,1013714,1013715,1013716,1013717,1013718,1013719,1013720,1013721);
My key question is why the time differential is much bigger than the where id in sql;
The inv_inventory_change table has 2321211 records;
And i add 'order by id asc' on above sqls, the timeconsumption not change;
EXPLAIN
The rule is very simple; your first query can be served without reading data from the disk/memory cache.
select id from inv_inventory_change limit 1000000,10
This can be directly served from the index table (B-Tree or its variant) without reading page information and other meta information.
select * from inv_inventory_change limit 1000000,10
This query will require two steps to fetch records. First, it will perform a query on the index table, which would be quick, but next, it needs to read page information for those records that will require disk io and storing in cache, etc. Since a LIMIT is applied, it will automatically sort for you depending on the default ORDER BY setting, most likely it will sort using the id field. Since you're selecting a large number of records it will use FileSort to sort records or something similar.
select * from inv_inventory_change c where c.id in (1013712,1013713,1013714,1013715,1013716,1013717,1013718,1013719,1013720,1013721);
This query would be served using a range scan on the index table and it can find the entry corresponding to 1013712 in O(lon N) time and it should be able the serve the query quickly.
You should also look at the number of records you're reading, e.g the query having limit 1000000,10 will require many disk io due to a large number of entries whereas in the 3rd example it will read a handful number of pages.
Sorry for a cryptic title... My issue:
I have a mysql query which in the most simplified form would looks like this:
SELECT * FROM table
WHERE _SOME\_CONDITIONS_
ORDER BY `id` DESC
LIMIT 50
Without the LIMIT clause, the query would return around 50,000 rows, however I am ever only interested in the first 50 rows. Now I realise that because I add ORDER BY bit MySQL has to create a temporary table and load all results in it, then order the 50,000 results and only then it can return the first 50 results.
When I compare performance of this query versus query without ORDER BY I get a staggering difference of 1.8 seconds vs 0.02 seconds.
Given that id is auto incrementing primary key I thought that there should be an elegant work around for my problem. Is there any?
Are the SOME_CONDITIONS such that you could give the query an ID range? At the very least, you could limit the number of rows being added into the temporary table before the sort.
For example:
SELECT * FROM table
WHERE _SOME\_CONDITIONS_
AND id BETWEEN 1 AND 50;
Alternatively maybe a nested query to give the min and max IDs if the SOME_CONDITIONS prevent you from making this sort of assumption about the range of IDs in your result.
If the performance is really that important, I would de-normalize the data by creating another table or cache of the first 50 results and keeping it updated separately.
Say, I've table with 100,000 rows, I want to fetch rows in slabs of 50 with a particular Where clause
Standard way of doing this: select * from table where userid=5 limit 50 offset 90500;
This runs awefully slow.
Cause: All 100,000 rows are analyzed first and Limit is applied at the last stage.
Any thoughts how to speed this up. Anyone ?
Putting an index on "userid" should really help.
Use a primary key in the order by as galz says e.g order by id assuming your table has the column id
Alternatively, try use an index field (generalization of 1)
You may also use partitions
1 - Using ORDER BY you can improve, but no so much;
2 - Enabling the cache and then selecting from the cache, may improve the query;
3 - Setting an index helps, but no so much too.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.