I have an index structured as so:
BTREE merchant_id
BTREE flag
BTREE test (merchant_id, flag)
I do a SELECT query as such :
SELECT badge_id, merchant_id, badge_title, badge_class
FROM badges WHERE merchant_id = 1 AND flag = 1
what index would be better? Does it matter if they are in a seperate index?
To answer questions such as "Which column would be better to index?", and "Is the query planner using a certain index to execute the query?", you can use the EXPLAIN statement. See the excellent article Analyzing Queries for Speed with EXPLAIN for a comprehensive overview of the use of EXPLAIN in optimizing queries and schema.
In general, where a query can be optimized by indexing one of several columns, a helpful rule of thumb is to index the column that is "most unique" or "most selective" over all records; that is, index the column that has the most number of distinct values over all rows. I am guessing that in your case, the merchant_id column contains the most number of unique values, so it should probably be indexed. You can verify that an index choice is optimal using EXPLAIN on the query for all variations.
Note that the rule of thumb "index the most selective column" does not necessarily apply to the choice of the first column of a composite (also called compound or multi-column) index. It depends on your queries. If, for example, employee_id is the most selective column, but you need to execute queries like SELECT * FROM badges WHERE flag = 17, then having as the only index on table badges the composite index (employee_id, flag) would mean that the query results in a full table scan.
Out of 3 indices you don't really need a separate merchant_id index, since merchant_id look-ups can use your "test" index.
More details:
http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Related
I have a table with the following columns:
id-> PK
customer_id-> index
store_id-> index
order_date-> index
last_modified-> index
other_columns...
other_columns...
I have three single column index. I also have a customer_id_store_id index which is a foreign key constraint referencing other tables.
id, customer_id, store_id are char(36) which is UUID. order_date is datetime and last_modifed is UNIX timestamp.
I want to gain some performance by removing all index and adding one with (customer_id, store_id, order_date). Most queries will have these fields in the where clause. But sometimes the store_id will not be needed.
What is the best approach? to add "store_id IS NOT NULL" in the where clause or creating the index this way (customer_id, order_date, store_id).
I also frequently need to query the table by last_modified field (where clause includes customer_id=, store_id=, last_modified>).
As I only have a single column index on it and there are hundreds of customers who is insert/updating the tables, more often the index scans rows more than necessary. Is it better to create another index (customer_id, store_id, last_modified) or leave it as it is? Or add this column to the previous index making it four columns composite index. But then again the order_date is irrelevant here and omitting it might result the index not being used as intended.
The query works fast on customers that don't have many rows possibly using the customer_id index there. But for customers with large amount of data, this isn't optimal. More often I need only few days of data.
Can anyone please advise what's the best index in this scenario.
It is true that lots of single column indexes on a MySQL table are generally considered harmful.
A query with
WHERE customer_id=constant AND store_id=constant AND last_modified>=constant
will be accelerated by an index on (customer_id, store_id, last_modified). Why? The MySQL query planner can random-access the index to the first item it needs to retrieve, then scan the index sequentially. That same index works for
WHERE customer_id=constant AND store_id=constant
AND last_modified>=constant
AND last_modified< constant + INTERVAL 1 DAY
BUT, that index will not be useful for a query with just
WHERE store_id=constant AND last_modified>constant
or
WHERE customer_id=constant AND store_id IS NOT NULL AND last_modified>=constant
For the first of those query patterns you need (store_id, last_modified) to achieve the ability to sequentially scan the index.
The second of those query patterns requires two different range searches. One is something IS NOT NULL. That's a range search because it has to romp through all the non-null values in the column. The second range search is last_modified>=constant. That's a range search, because it starts with the first value of last_modified that meets the given criterion, and scans to the end of the index.
MySQL indexes are B-trees. That means, essentially, that they're sorted into a particular single order. So, an index is best for accelerating queries that require just one range search. So, the second query pattern is inherently hard to satisfy with an index.
A table can have multiple compound indexes designed to satisfy multiple different query patterns. That's usually the strategy to large tables work well in practical applications. Each index imposes a little bit of performance penalty on updates and inserts. Indexes also take storage space. But storage is very cheap these days.
If you want to use a compound index to search on multiple criteria, these things must be true:
all but one of the criteria must be equality criteria like store_id = constant.
one criterion can be a range-scan criterion like last_modified >= constant or something IS NOT NULL.
the columns in the index must be ordered so that the columns involved in equality criteria all appear, then the the column involved in the range-scan criterion.
you may mention other columns after the range scan criterion. But they make up part of a covering index strategy (beyond the scope of this post).
http://use-the-index-luke.com/ is a good basic intro to the black art of indexing.
I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
Making my concept more clear about indexes in Mysql. What i know is indexes are used to make your query faster. except that i have couple of questions to know about.
Let's say i'm having a query:
SELECT books.name, books.name2, books.id, books.image, books.faith, books.topic, books.downloaded, books.viewed, books.language, books.size, books.author as author_id, authors.name as author_name, authors.aid from books LEFT JOIN authors ON books.author = authors.aid WHERE books.id = '".$id."' AND status = 1
Is any index applicable for this select query while it has JOIN?
After making an index on a column query for that will be optimized
or changed ?
How to make index for this query and against which column?
What's the other benefits or disadvantages of using indexes?
On which case indexes should be avoided and where should use indexes
for more?
Do indexes applicable on random queries ?
Are indexes more efficient on IDS ?
Please apprise, thank you in advanced !
You can check details on below links.
https://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://mydbsolutions.in/query-optimization-2/
Your queries are here-
Is any index applicable for this select query while it has JOIN?
It will depend on various factors like if your table have very less data or approx. 70% data is same in index column then mysql will prefer to scan table instead of index. In simple all your join columns should be indexed (will be indexed if you use foreign key concept other wise you should indexed them). Also on which column your query is filtering most data that field should be indexed. In your case you are filtering data on books.id which should be primary key so already indexed.
After making an index on a column query for that will be optimized or changed?
It will be automatically start to use index but in some cases may be you need to change your query. Suppose you are using a filter condition as "date(order_date)='2015-10-15'", even after creating index on order_date it will not be used so you have to change your query as "order_date>='2015-10-15 00:00:00' and order_date<='2015-10-15 23:59:59'" if you order_date column data type is datetime or timestamp.
How to make index for this query and against which column?
Here I am not seeing any need of making index as your condition is on books table primary key and it will be already indexed.
What's the other benefits or disadvantages of using indexes?
If you create index blindly then at the time of record insertion/updation etc each time index will be updated and will slow the process. Even heavy index will perform slow. Also will consume more disk space.
On which case indexes should be avoided and where should use indexes for more?
If more than 70% data is same for any column then no need to create index on them like status or is_deleted type columns as mostly data will be active.
Do indexes applicable on random queries ?
Yes index work on random queries, for repeatable queries you can use query cache which will be more efficient.
Are indexes more efficient on IDS ?
Yes.
If I execute this query:
SELECT * FROM table1 WHERE name LIKE '%girl%'
It returns all records where name contains 'girl'. However, because of the first wildcard % in the LIKE statment, it cannot (or does not) use indexes as stated here: Mysql Improve Search Performance with wildcards (%%)
Then I changed the query to:
SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
On the leftside of the OR I removed the wildcard so it can use indexes. But the performance win depends on how MySQL evaluates the query.
Hence my question: Does the performance of my query increases when I add the OR statement?
No, the performance will be the same. MySQL still has to evaluate the first condition (LIKE '%girl%') because of the OR. Then it can evaluate the second condition using index. You can see this info when you EXPLAIN your query (mysql will show that it stills needs to do a full table scan, which means check each row):
EXPLAIN SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
For better performance for these kinds of queries you would need to use Fulltext indexes and special syntax for querying them. But FT indexes behave different and are not suited for everything.
(This answer provides a summary of the comments, plus contradicts some of the previous notes.)
Leading wildcard:
SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
SELECT * FROM table1 WHERE name LIKE '%girl%'
Either of those will do a table scan and ignore any indexes. This both because of the leading wild card and the OR. (It will not use the index for 'girl%', contrary to what #Marki555 says -- it's not worth the extra effort.)
Range query via LIKE (no leading wildcard):
SELECT * FROM table1 WHERE name LIKE 'girl%'
will probably use INDEX(name) in the following way:
Drill down the BTree for that index to the first name starting with "girl";
Scan forward (in the index) until the last row starting with "girl";
For each item in step 2, reach over into the data to get *.
Since Step 3 can be costly, the optimizer first estimates how many rows will need to be touched in Step 2. If more than 20% (approx) of the table, it will revert to a table scan. (Hence, my use of "probably".)
"Covering index":
SELECT name FROM table1 WHERE name LIKE '%girl%'
This will always use INDEX(name). That is because the index "covers". That is, all the columns in the SELECT are found in the INDEX. Since an INDEX looks and feels like a table, scanning the index is the best way to do the query. Since an index is usually smaller than the table, an index scan is usually faster than a table scan.
Here's a less obvious "covering index", but it applies only to InnoDB:
PRIMARY KEY(id)
INDEX(name)
SELECT id FROM table1 WHERE name LIKE '%girl%'
Every secondary key (name) in InnoDB implicitly includes the PK (id). Hence the index looks like (name, id). Hence all the columns in the SELECT are in the index. Hence it is a "covering index". Hence it will use the index and do an "index scan".
A "covering index" is indicated by Using index showing up in the EXPLAIN SELECT ....