Suppose you have a table with the following columns:
id
date
col1
I would like to be able to query this table with a specific id and date, and also order by another column. For example,
SELECT * FROM TABLE WHERE id = ? AND date > ? ORDER BY col1 DESC
According to this range documentation, an index will stop being used after it hits the > operator. But according to this order by documentation, an index can only be used to optimize the order by clause if it is ordering by the last column in the index. Is it possible to get an indexed lookup on every part of this query, or can you only get 2 of the 3? Can I do any better than index (id, date)?
Plan A: INDEX(id, date) -- works best if when it filters out a lot of rows, making the subsequent "filesort" not very costly.
Plan B: INDEX(col1), which may work best if very few rows are filtered by the WHERE clause. This avoids the filesort, but is not necessarily faster than the other choices here.
Plan C: INDEX(id, date, col1) -- This is a "covering" index if the query does not reference any other fields. The potential advantage here is to look only at the index, and not have to touch the data. If it applies, Plan C is better than Plan A.
You have not provided enough information to say which of these INDEXes will work best. Suggest you add C and B, if "covering" applies; else add A and B. The see which index the Optimizer picks. (There is still a chance that the Optimizer will not pick 'right'.)
(These three indexes are what my Index blog recommends.)
Related
I am using MySQL 5.6 and try to optimize next query:
SELECT t1.field1,
...
t1.field30,
t2.field1
FROM Table1 t1
JOIN Table2 t2 ON t1.fk_int = t2.pk_int
WHERE t1.int_field = ?
AND t1.enum_filed != 'value'
ORDER BY t1.created_datetime desc;
A response can contain millions of records and every row consists of 31 columns.
Now EXPLAIN says in Extra that planner uses 'Using where'.
I tried to add next index:
create index test_idx ON Table1 (int_field, enum_filed, created_datetime, fk_int);
After that EXPLAIN says in Extra that planner uses "Using index condition; Using filesort"
"rows" value from EXPLAIN with index is less than without it. But in practice time of execution is longer.
So, the questions are next:
What is the best index for this query?
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
Should the fields from ORDER BY be in index?
Should the fields from JOIN be in index?
try refactor your index this way
avoid (o move to the right after fk_int) the created_datetime column.. and move fk_int before the enum_filed column .. the in this wahy the 3 more colums used for filter shold be use better )
create index test_idx ON Table1 (int_field, fk_int, enum_filed);
be sure you have also an specific index on table2 column pk_int. if you have not add
create index test_idx ON Table2 (int_field, fk_int, enum_filed);
What is the best index for this query?
Maybe (int_field, created_datetime) (See next Q&A for reason.)
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
enum_filed != defeats the optimizer. If there is only one other value for that enum (and it is NOT NULL), then use = and the other value. And try INDEX(int_field, enum_field, created_datetime) The Optimizer is much happier with = than with any inequality.
"5" could be indicating 2 columns, or it could be indicating one INT that is Nullable. If int_field can be NULL, consider changing it to NOT NULL; then the "5" would drop to "4".
Should the fields from ORDER BY be in index?
Only if the index can completely handle the WHERE. This usually occurs only if all the WHERE tests are =. (Hence, my previous answer.)
Another case for including those columns is "covering"; see next Q&A.
Should the fields from JOIN be in index?
It depends. One thing that gives some performance benefit is to include all columns mentioned anywhere in the SELECT. This is called a "covering" index and is indicated in EXPLAIN by Using index (not Using index condition). There are too many columns in t1 to add a "covering" index. I think the practical limit is about 5 columns.
My guess for your question № 1:
create index my_idx on Table1(int_field, created_datetime desc, fk_int)
or one of these (but neither will probably be worthwhile):
create index my_idx on Table1(int_field, created_datetime desc, enum_filed, fk_int)
create index my_idx on Table1(int_field, created_datetime desc, fk_int, enum_filed)
I'm supposing 3 things:
Table2.pk_int is already a primary key, judging by the name
The where condition on Table1.int_field is only satisfied by a small subset of Table1
The inequality on Table1.enum_filed (I would fix the typo, if I were you) only excludes a small subset of Table1
Question № 2: the key_len refers to the keys used. Don't forget that there is one extra byte for nullable keys. In your case, if int_field is nullable, it means that this is the only key used, otherwise both int_field and enum_filed are used.
As for questions 3 and 4: If, as I suppose, it's more efficient to start the query plan from the where condition on Table1.int_field, the composite index, in this case also with the correct sort order (desc), enables a scan of the index to get the output rows in the correct order, without an extra sort step. Furthermore, adding also fk_int to the index makes the retrieval of any record of Table1 unnecessary unless a corresponding record is present in Table2. For a similar reason you could also add enum_filed to the index, but, if this doesn't considerably reduce the output record count, the increase in index size will make things worse instead of better. In the end, you will have to try it out (with realistic data!).
Note that if you put another column between int_field and created_datetime in the index, the index won't provide the created_datetime (for a given int_field) in the desired output order.
The issue was fixed by adding more filters (to where clause) to the query.
Regarding to indexes 2 proposed indexes were helpful:
From #WalterTross with next index for initial query:
(int_field, created_datetime desc, enum_filed, fk_int)
With my short comment: desc indexes is not supported at MySQL 5.6 - this key word just reserved.
From #RickJames with next index for modified query:
(int_field, created_datetime)
Thanks everyone who tried to help. I really appreciate it.
I am using MySQL 8 version.
Let suppose I have a table i.e. event and there are approx 15 columns in this. Let suppose column names are from a,b,c ... up to m having varchar datatype. This is a master table so having one-to-one records in this. I provided a dashboard for this table and the client can select fields to filter records as per their need.
There are 12 fields on that filter are applying. The query is creating based on selected fields.
If field a is selected then query will be like select * from event where a = <some value>.
If b and c selected then the query will be like select *from event where b= <some value> and c= <some value>
So Can you suggest to me how can I create an index for better optimization?
There is a limit to the number of indexes you can have on a table -- both an absolute limit (64) and a practical limit (much less).
I suggest you start with 12 2-column 'composite' indexes. Have a different starting column for each of the 12. Have a "likely" second column.
Over time, watch what the users typically pick and add/subtract indexes accordingly.
Keep in mind these things:
The Optimizer does not care what order the WHERE clause is.
The Optimizer does care what order the columns of an index are in.
The best index starts with column(s) that are tested for =.
Usually, when a column is tested with a range (eg date, or price), further columns in the index are not useful.
More tips (that you seem to have found): http://mysql.rjweb.org/doc.php/index_cookbook_mysql
INDEX(a,b) will do a pretty good job for WHERE a=1 AND B=2 AND c=3. It won't be as good as INDEX(a,b,c). But I am suggesting that you have to make tradeoffs -- Be happy with "pretty good"; you can't achieve "perfect" in all cases.
I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?
If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.
+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.
I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);
I am having a problem with the following task using MySQL. I have a table Records(id,enterprise, department, status). Where id is the primary key, and enterprise and department are foreign keys, and status is an integer value (0-CREATED, 1 - APPROVED, 2 - REJECTED).
Now, usually the application need to filter something for a concrete enterprise and department and status:
SELECT * FROM Records WHERE status = 0 AND enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
The order by is required, since I have to provide the user with the most recent records. For this query I have created an index (enterprise, department, status), and everything works fine. However, for some privileged users the status should be omitted:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
This obviously breaks the index - it's still good for filtering, but not for sorting. So, what should I do? I don't want create a separate index (enterprise, department), so what if I modify the query like this:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
AND status IN (0,1,2)
ORDER BY id desc LIMIT 0,10;
MySQL definitely does use the index now, since it's provided with values of status, but how quick will the sorting by primary key be? Will it take the recent 10 values for each status available, and then merge them, or will it first merge the ids for each status together, and only after that take the first ten (this way it's gonna be much slower I guess).
All of the queries will benefit from one composite query:
INDEX(enterprise, department, status, id)
enterprise and department can swapped, but keep the rest of the columns in that order.
The first query will use that index for both the WHERE and the ORDER BY, thereby be able to find the 10 rows without scanning the table or doing a sort.
The second query is missing status, so my index is less than perfect. This would be better:
INDEX(enterprise, department, id)
At that point, it works like above. (Note: If the table is InnoDB, then this 3-column index is identical to your 2-column INDEX(enterprise, department) -- the PK is silently included.)
The third query gets dicier because of the IN. Still, my 4 column index will be nearly the best. It will use the first 3 columns, but not be able to do the ORDER BY id, so it won't use id. And it won't be able to comsume the LIMIT. Hence the EXPLAIN will say Using temporary and/or Using filesort. Don't worry, performance should still be nice.
My second index is not as good for the third query.
See my Index Cookbook.
"How quick will sorting by id be"? That depends on two things.
Whether the sort can be avoided (see above);
How many rows in the query without the LIMIT;
Whether you are selecting TEXT columns.
I was careful to say whether the INDEX is used all the way through the ORDER BY, in which case there is no sort, and the LIMIT is folded in. Otherwise, all the rows (after filtering) are written to a temp table, sorted, then 10 rows are peeled off.
The "temp table" I just mentioned is necessary for various complex queries, such as those with subqueries, GROUP BY, ORDER BY. (As I have already hinted, sometimes the temp table can be avoided.) Anyway, the temp table comes in 2 flavors: MEMORY and MyISAM. MEMORY is favorable because it is faster. However, TEXT (and several other things) prevent its use.
If MEMORY is used then Using filesort is a misnomer -- the sort is really an in-memory sort, hence quite fast. For 10 rows (or even 100) the time taken is insignificant.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.