I have a query SELECT.. WHERE user_id='' && date>:expire && used=0;
When I try to create index. should I create all together in one query like
CREATE INDEX new_index ON table (user_id, date, used)
or should I separate them and create index for each column?
This really depends on how you plan the query these columns. Your safest bet would be just make indexes for each column, though this may not yield optimum performance.
Multi-column indexes are useful for cases where:
You need to enforce a unique constraint across the combination of column values
You know that you will always uses columns for joins, where clauses, order by, group by, etc. in a specific combination
For example for the combo index you proposed (user_id, date, used) you would be able to utilize the index only in the following conditions:
You are doing join, where, etc. only on user_id
You are doing join, where, etc. on user_id and date
You are doing join, where, etc. on all three columns
You would not be able to utilize the index for these cases
You are doing join, where, etc. on date or used individually
You are doing join, where, etc. on date and used
For further reading, here is MySQL documentation on multi-column indexes:
http://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html
Based on your query, you should use a multi-column index. However, the right index is:
CREATE INDEX new_index ON table (user_id, used, date)
Note that date is last in the index, because you have an inequality.
MySQL happens to have a very good discussion here on multi-column indexes and how they are used.
Related
I have a table with a billion+ rows. I have have the below query which I frequently execute:
SELECT SUM(price) FROM mytable WHERE domain IN ('com') AND url LIKE '%/shop%' AND date BETWEEN '2001-01-01' AND '2007-01-01';
Where domain is varchar(10) and url is varchar(255) and price is float. I understand that any query with %..% will not use any index. So logically, I created an index on price domain and date:
create index price_date on mytable(price, domain, date)
The problem here persists, this index is also not used because query contains: url LIKE '%.com/shop%'
On the other hand a FULLTEXT index still will not work since I have other non text filters in the query.
How can I optimise the above query? I have too many rows not to use an index.
UPDATE
Is this an sql limit? could such a query provide better performance on a noSQL database?
You have two range conditions, one uses IN() and the other uses BETWEEN. The best you can hope is that the condition on the first column of the index uses the index to examine rows, and the condition on the second column of the index uses index condition pushdown to make the storage engine do some pre-filtering.
Then it's up to you to choose which column should be the first column in the index, based on how well each condition would narrow down the search. If your condition on date is more likely to reduce the set of examined rows, then put that first in the index definition.
The order of terms in the WHERE clause does not have to match the order of columns in the index.
MySQL does not support optimizing with both a fulltext index and a B-tree index on the same table reference in the same query.
You can't use a fulltext index anyway for the pattern you are searching for. Fulltext indexes don't allow searches for punctuation characters, only words.
I vote for this order:
INDEX(domain, -- first because of "="
date, -- then range
url, price) -- "covering"
but, since the constants look like most of the billion rows would be hit, I don't expect good performance.
If this is a common query and/or "shop" is one of only a few possible filters, we can discuss whether a summary table would be useful.
I have a table with the following columns:
id-> PK
customer_id-> index
store_id-> index
order_date-> index
last_modified-> index
other_columns...
other_columns...
I have three single column index. I also have a customer_id_store_id index which is a foreign key constraint referencing other tables.
id, customer_id, store_id are char(36) which is UUID. order_date is datetime and last_modifed is UNIX timestamp.
I want to gain some performance by removing all index and adding one with (customer_id, store_id, order_date). Most queries will have these fields in the where clause. But sometimes the store_id will not be needed.
What is the best approach? to add "store_id IS NOT NULL" in the where clause or creating the index this way (customer_id, order_date, store_id).
I also frequently need to query the table by last_modified field (where clause includes customer_id=, store_id=, last_modified>).
As I only have a single column index on it and there are hundreds of customers who is insert/updating the tables, more often the index scans rows more than necessary. Is it better to create another index (customer_id, store_id, last_modified) or leave it as it is? Or add this column to the previous index making it four columns composite index. But then again the order_date is irrelevant here and omitting it might result the index not being used as intended.
The query works fast on customers that don't have many rows possibly using the customer_id index there. But for customers with large amount of data, this isn't optimal. More often I need only few days of data.
Can anyone please advise what's the best index in this scenario.
It is true that lots of single column indexes on a MySQL table are generally considered harmful.
A query with
WHERE customer_id=constant AND store_id=constant AND last_modified>=constant
will be accelerated by an index on (customer_id, store_id, last_modified). Why? The MySQL query planner can random-access the index to the first item it needs to retrieve, then scan the index sequentially. That same index works for
WHERE customer_id=constant AND store_id=constant
AND last_modified>=constant
AND last_modified< constant + INTERVAL 1 DAY
BUT, that index will not be useful for a query with just
WHERE store_id=constant AND last_modified>constant
or
WHERE customer_id=constant AND store_id IS NOT NULL AND last_modified>=constant
For the first of those query patterns you need (store_id, last_modified) to achieve the ability to sequentially scan the index.
The second of those query patterns requires two different range searches. One is something IS NOT NULL. That's a range search because it has to romp through all the non-null values in the column. The second range search is last_modified>=constant. That's a range search, because it starts with the first value of last_modified that meets the given criterion, and scans to the end of the index.
MySQL indexes are B-trees. That means, essentially, that they're sorted into a particular single order. So, an index is best for accelerating queries that require just one range search. So, the second query pattern is inherently hard to satisfy with an index.
A table can have multiple compound indexes designed to satisfy multiple different query patterns. That's usually the strategy to large tables work well in practical applications. Each index imposes a little bit of performance penalty on updates and inserts. Indexes also take storage space. But storage is very cheap these days.
If you want to use a compound index to search on multiple criteria, these things must be true:
all but one of the criteria must be equality criteria like store_id = constant.
one criterion can be a range-scan criterion like last_modified >= constant or something IS NOT NULL.
the columns in the index must be ordered so that the columns involved in equality criteria all appear, then the the column involved in the range-scan criterion.
you may mention other columns after the range scan criterion. But they make up part of a covering index strategy (beyond the scope of this post).
http://use-the-index-luke.com/ is a good basic intro to the black art of indexing.
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
I have query that uses order-by group-by
select count(*),filed2
from table1 where field1>x group by filed2 order by count(*) desc
what are the best indexes for this query.
sholud I index filed1,field2 seprate or together?
You should create the index with both columns in two different orders
ALTER TABLE table1 ADD INDEX field1_field2_ndx (field1,field2);
ALTER TABLE table1 ADD INDEX field2_field1_ndx (field2,field1);
You should not create individual indexes because making the index with both columns will cause the query to pass through the index only to satisfy the query. It would never need to touch the table.
Even if you made individual indexes, the Query Optimizer would choose the two column index anyway.
Now that you have the two indexes, just trust the Query Optimizer to select the correct index. Based on the query, the EXPLAIN plan would choose the field2_field1_ndx index.
I have an index structured as so:
BTREE merchant_id
BTREE flag
BTREE test (merchant_id, flag)
I do a SELECT query as such :
SELECT badge_id, merchant_id, badge_title, badge_class
FROM badges WHERE merchant_id = 1 AND flag = 1
what index would be better? Does it matter if they are in a seperate index?
To answer questions such as "Which column would be better to index?", and "Is the query planner using a certain index to execute the query?", you can use the EXPLAIN statement. See the excellent article Analyzing Queries for Speed with EXPLAIN for a comprehensive overview of the use of EXPLAIN in optimizing queries and schema.
In general, where a query can be optimized by indexing one of several columns, a helpful rule of thumb is to index the column that is "most unique" or "most selective" over all records; that is, index the column that has the most number of distinct values over all rows. I am guessing that in your case, the merchant_id column contains the most number of unique values, so it should probably be indexed. You can verify that an index choice is optimal using EXPLAIN on the query for all variations.
Note that the rule of thumb "index the most selective column" does not necessarily apply to the choice of the first column of a composite (also called compound or multi-column) index. It depends on your queries. If, for example, employee_id is the most selective column, but you need to execute queries like SELECT * FROM badges WHERE flag = 17, then having as the only index on table badges the composite index (employee_id, flag) would mean that the query results in a full table scan.
Out of 3 indices you don't really need a separate merchant_id index, since merchant_id look-ups can use your "test" index.
More details:
http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html