I have 50K records in my table. This is my query.
SELECT * FROM user_details WHERE user_id > 10 AND status = 1 LIMIT 0,10
When I "EXPLAIN" this query it is still traversing 24959 rows. Can it be more optimizable so that it can traverse less rows?
You're filtering by both user_id > 10 and status = 1.
There is no index on status.
user_id > 10 will be nearly every row in the table. A full table scan will be faster than using the index.
So the optimizer has decided a full table scan will be fastest.
You can fix this by adding an index on status. A longer term solution might be to partition the table on status.
Side note: rather than relying on magic user_ids, consider adding an explicit field to user to indicate their role.
Related
i have this query which is very simple but i dont want to use index here due to some constraints.
so my worry is how to avoid huge load on server if we are calling non indexed item in where clause.
the solution i feel will be limit.
i am sure of having data in 1000 rows so if i use limit i can check the available values.
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
here student column is not indexed in mysql so my worry is it will cause huge load in server
i tried with explain and it seems to be ok but problem is less no of rows in table and as u know mysql goes crazy with more data like millions of rows.
so what are my options ??
i should add index for student ??
if i will add index then i dont need 1000 rows in limit. one row is sufficient and as i said table is going to be several millions of rows so it requires lot of space so i was thinking to avoid indexing of student column and other query is 1000 row with desc row should not cause load on server as id is indexed.
any help will be great
You say:
but i dont want to use index here due to some constraints...
and also say:
how to avoid huge load on server...
If you don't use an index, you'll produce "huge load" on the server. If you want this query to be less resource intensive, you need to add an index. For the aforementioned query the ideal index is:
create index on tableA (student, status, id);
This index should make your query very fast, even with millions of rows.
LIMIT 100 doesn't force the database to search in the first 100 rows.
It just stop searching after 100 matches are found.
So it is not used for performance.
In the query below
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
The query will run until it finds 1000 matches.
It doesn't have to search only the first 1000 rows
So this is the behaviour of the above query:
int nb_rows_matched = 0;
while (nb_rows_matched < 1000){
search_for_match();
}
For example, I have the following table:
table Product
------------
id
category_id
processed
product_name
This table has index on columns id category_id and processed and (category_id, proccessed). The statistic on this table is:
select count(*) from Product; -- 50M records
select count(*) from Product where category_id=10; -- 1M records
select count(*) from Product where processed=1; -- 30M records
My simplest query I want to query is: (select * is the must).
select * from Product
where category_id=10 and processed=1
order by id ASC LIMIT 100
The above query without limit only has about 10,000 records.
I want to call the above query for multiple time. Every time I get out I will update field processed to 0. (so it will not appear on the next query). When I test on the real data, sometime the optimizer try to use id as the key, so it cost a lot of time.
How can I optimize the above query (In general term)
P/S: for avoiding confuse, I know that the best index should be (category, processed, id). But I cannot change the index. My question is just only related to optimize the query.
Thanks
For this query:
select *
from Product
where category_id = 10 and processed = 1
order by id asc
limit 100;
The optimal index is on product(category_id, processed, id). This is a single index with a three-part key, with the keys in this order.
Given that you have INDEX(category_id, processed), there is virtually no advantage in also having just INDEX(category_id). So DROP the latter.
That may have the beneficial side effect of pushing the Optimizer toward the composite INDEX(category_id, processed), which is at least "better" for the query.
Without touching the indexes, you could use a FORCE INDEX mentioning the composite index's name. But I don't recommend it. "It may help today, but hurt tomorrow, after the data changes."
Why do you say "But I cannot change the index."? Newer version of MySQL/MariaDB make ADD/DROP INDEX much faster than older versions. Also, pt-online-schema-change is provides a fast way.
So suppose I have this query:
select * from tablea where budget > 100 and users > 1000
Separate queries:
select * from tablea where budget > 100
and
select * from tablea where users > 1000
Complete in 0.01~ seconds
However, when the conditions are combined the query takes 1-2 seconds to complete.
I have a index on both columns (budget, user).
Does any smart person on SOF know why this is happening, and how to optimize it in this case?
Extra notes:
When the first query is run, there is 100000 rows, 100000 rows for
the second query, and 3 rows when the queries are combined
The table has over 1mil rows
EXPLAIN shows that the compound index (budget, user) is used: EXPLAIN EXTENDED shows that there are 1477594 rows, and about 50% are filtered
EXPLAIN doesn't really know, so it says exactly 50%. Don't trust that.
You used a LIMIT, and early in the table were that many rows that met the single criteria.
But it had to scan the entire table to find the 3 rows for the combined filter.
No index is useful for two ranges (budget > 100 and users > 1000), but either of these will help:
INDEX(budget, ...)
INDEX(users, ...)
The ... could be anything or nothing. One index may be used, but only for the first column.
There is a chance that both such indexes could be used. This is with the "index merge" facility. If it were used, then the process would be to pull 100K ids from the budget index, 100K rows from the user index, then find the 3 ids that are in both, and finally look up the rows (*) for them. The Optimizer decided that was not worth it, based on "50%" or "100K" or the phase of the moon.
Since 10% of the table meets the single criteria, the Optimizer may decide that it is faster to scan the table rather than bounce back and forth between the index and the data. Please provide SHOW CREATE TABLE and EXPLAIN SELECT.
I am having a problem with the following task using MySQL. I have a table Records(id,enterprise, department, status). Where id is the primary key, and enterprise and department are foreign keys, and status is an integer value (0-CREATED, 1 - APPROVED, 2 - REJECTED).
Now, usually the application need to filter something for a concrete enterprise and department and status:
SELECT * FROM Records WHERE status = 0 AND enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
The order by is required, since I have to provide the user with the most recent records. For this query I have created an index (enterprise, department, status), and everything works fine. However, for some privileged users the status should be omitted:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
This obviously breaks the index - it's still good for filtering, but not for sorting. So, what should I do? I don't want create a separate index (enterprise, department), so what if I modify the query like this:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
AND status IN (0,1,2)
ORDER BY id desc LIMIT 0,10;
MySQL definitely does use the index now, since it's provided with values of status, but how quick will the sorting by primary key be? Will it take the recent 10 values for each status available, and then merge them, or will it first merge the ids for each status together, and only after that take the first ten (this way it's gonna be much slower I guess).
All of the queries will benefit from one composite query:
INDEX(enterprise, department, status, id)
enterprise and department can swapped, but keep the rest of the columns in that order.
The first query will use that index for both the WHERE and the ORDER BY, thereby be able to find the 10 rows without scanning the table or doing a sort.
The second query is missing status, so my index is less than perfect. This would be better:
INDEX(enterprise, department, id)
At that point, it works like above. (Note: If the table is InnoDB, then this 3-column index is identical to your 2-column INDEX(enterprise, department) -- the PK is silently included.)
The third query gets dicier because of the IN. Still, my 4 column index will be nearly the best. It will use the first 3 columns, but not be able to do the ORDER BY id, so it won't use id. And it won't be able to comsume the LIMIT. Hence the EXPLAIN will say Using temporary and/or Using filesort. Don't worry, performance should still be nice.
My second index is not as good for the third query.
See my Index Cookbook.
"How quick will sorting by id be"? That depends on two things.
Whether the sort can be avoided (see above);
How many rows in the query without the LIMIT;
Whether you are selecting TEXT columns.
I was careful to say whether the INDEX is used all the way through the ORDER BY, in which case there is no sort, and the LIMIT is folded in. Otherwise, all the rows (after filtering) are written to a temp table, sorted, then 10 rows are peeled off.
The "temp table" I just mentioned is necessary for various complex queries, such as those with subqueries, GROUP BY, ORDER BY. (As I have already hinted, sometimes the temp table can be avoided.) Anyway, the temp table comes in 2 flavors: MEMORY and MyISAM. MEMORY is favorable because it is faster. However, TEXT (and several other things) prevent its use.
If MEMORY is used then Using filesort is a misnomer -- the sort is really an in-memory sort, hence quite fast. For 10 rows (or even 100) the time taken is insignificant.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.