I run this query on SQL Server 2008:
SELECT *
FROM Dealers WITH (INDEX(0))
WHERE ID = 'rrsdsd'
But the execution plan shows that it used the clustered index of the table.
Why so?
What you are telling SQL Server is to execute the query not using any indexes. Therefore, I would expect the query plan to show the clustered index being used (because that is your data), but a scan being done and not a seek. Is this the case?
According to the documentation:
If a clustered index exists, INDEX(0) forces a clustered index scan and INDEX(1) forces a clustered index scan or seek.
This is exactly what you are seeing.
Documentation says it all...
INDEX (index_value [,... n ] ) | INDEX = ( index_value) The INDEX()
syntax specifies the names or IDs of one or more indexes to be used by
the query optimizer when it processes the statement. The alternative
INDEX = syntax specifies a single index value. Only one index hint per
table can be specified.
If a clustered index exists, INDEX(0) forces a clustered index scan and INDEX(1) forces a clustered index scan or seek. If no
clustered index exists, INDEX(0) forces a table scan and INDEX(1) is
interpreted as an error.
If multiple indexes are used in a single hint list, the duplicates are
ignored and the rest of the listed indexes are used to retrieve the
rows of the table. The order of the indexes in the index hint is
significant. A multiple index hint also enforces index ANDing, and the
query optimizer applies as many conditions as possible on each index
accessed. If the collection of hinted indexes do not include all
columns referenced by the query, a fetch is performed to retrieve the
remaining columns after the SQL Server Database Engine retrieves all
the indexed columns.
Related
For select statements like
select * from table where indexed_col='abc';
sql would go to index table and fetch row address and return required.
But what about dml statements like
update table set indexed_col='abc' where condition;
how is consistency ensured between table and indexed table?
MySQL updates all indexes that include the column indexed_col when you update that column.
It must update the clustered index (aka primary key index) of course. The columns of a given row are stored in the leaf node of the clustered index, so changing any column requires updating that index.
Other unique indexes on the same table that include the updated column must be updated at the same time. In other words, when you execute UPDATE, the time it takes for that statement to execute includes the time to update the clustered index and also any unique indexes that include the column indexed_col.
For non-unique secondary indexes, MySQL's default storage engine InnoDB uses a change buffer, which is a temporary list of pending changes to those indexes. When you update the column indexed_col, MySQL adds an entry to the change buffer for each index that column is part of. Then it calls the execution of your UPDATE done, and returns control to the client.
If you subsequently do a SELECT query as you show, MySQL checks both the table's indexes and the change buffer. Any entries in the change buffer for that index take priority, since they reflect more recent changes.
Eventually, MySQL runs a background thread to merge change buffer entries into the respective index.
I have a table with index on a int column.
Create table sample(
col1 varchar,
col2 int)
Create index idx1 on sample(col2);
When I explain the following query
Select * from sample where col2>2;
It does a full table scan.
Why doesn't the indexing work here?
How can i optimize such queries when table has around 20 million records?
Just because you create an index, does not mean MySQL will always use it. According to the docs, here are several reasons why it may choose to use a full table scan over the index:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
There are no usable restrictions in the ON or WHERE clause for indexed columns.
You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster. See Section 8.2.1.1, “WHERE Clause Optimization”.
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
You can use FORCE INDEX to ensure your query uses the index instead of allowing the optimizer to determine the appropriate path, although usually MySQL will take the most efficient approach.
SELECT * FROM t1, t2 FORCE INDEX (index_for_column) WHERE t1.col_name=t2.col_name;
Reference: https://dev.mysql.com/doc/refman/8.0/en/table-scan-avoidance.html
Reading the MySQL docs we see this example table with multiple-column index name:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
It is explained with examples in which cases the index will or will not be utilized. For example, it will be used for such query:
SELECT * FROM test
WHERE last_name='Widenius' AND first_name='Michael';
My question is, would it work for this query (which is effectively the same):
SELECT * FROM test
WHERE first_name='Michael' AND last_name='Widenius';
I couldn't find any word about that in the documentation - does MySQL try to swap columns to find appropriate index or is it all up to the query?
Should be the same because (from mysql doc) the query optiminzer work looking at
Each table index is queried, and the best index is used unless the
optimizer believes that it is more efficient to use a table scan. At
one time, a scan was used based on whether the best index spanned more
than 30% of the table, but a fixed percentage no longer determines the
choice between using an index or a scan. The optimizer now is more
complex and bases its estimate on additional factors such as table
size, number of rows, and I/O block size.
http://dev.mysql.com/doc/refman/5.7/en/where-optimizations.html
In some cases, MySQL can read rows from the index without even
consulting the data file.
and this should be you case
Without ICP, the storage engine traverses the index to locate rows in
the base table and returns them to the MySQL server which evaluates
the WHERE condition for the rows. With ICP enabled, and if parts of
the WHERE condition can be evaluated by using only fields from the
index, the MySQL server pushes this part of the WHERE condition down
to the storage engine. The storage engine then evaluates the pushed
index condition by using the index entry and only if this is satisfied
is the row read from the table. ICP can reduce the number of times the
storage engine must access the base table and the number of times the
MySQL server must access the storage engine.
http://dev.mysql.com/doc/refman/5.7/en/index-condition-pushdown-optimization.html
For the two queries you stated, it will work the same.
However, for queries which have only one of the columns, the order of the index matters.
For example, this will use the index:
SELECT * FROM test WHERE last_name='Widenius';
But this wont:
SELECT * FROM test WHERE first_name='Michael';
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
I run the following query on my database :
SELECT e.id_dernier_fichier
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
And the query runs fine. If I modifiy the query like this :
SELECT e.codega
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
The query becomes very slow ! The problem is I want to select many columns in table e and f, and the query can take up to 1 minute ! I tried different modifications but nothing works. I have indexes on id_* also on e.codega. Enfants has 9000 lines and FichiersEnfants has 20000 lines. Any suggestions ?
Here are the info asked (sorry not having shown them from the beginning) :
The difference in performance is possibly due to e.id_dernier_fichier being in the index used for the JOIN, but e.codega not being in that index.
Without a full definition of both tables, and all of their indexes, it's not possible to tell for certain. Also, including the two EXPLAIN PLANs for the two queries would help.
For now, however, I can elaborate on a couple of things...
If an INDEX is CLUSTERED (this also applies to PRIMARY KEYs), the data is actually physically stored in the order of the INDEX. This means that knowing you want position x in the INDEX also implicity means you want position x in the TABLE.
If the INDEX is not clustered, however, the INDEX is just providing a lookup for you. Effectively saying position x in the INDEX corresponds to position y in the TABLE.
The importance here is when accessing fields not specified in the INDEX. Doing so means you have to actually go to the TABLE to get the data. In the case of a CLUSTERED INDEX, you're already there, the overhead of finding that field is pretty low. If the INDEX isn't clustered, however, you effectifvely have to JOIN the TABLE to the INDEX, then find the field you're interested in.
Note; Having a composite index on (id_dernier_fichier, codega) is very different from having one index on just (id_dernier_fichier) and a seperate index on just (codega).
In the case of your query, I don't think you need to change the code at all. But you may benefit from changing the indexes.
You mention that you want to access many fields. Putting all those fields in a composite index is porbably not the best solution. Instead you may want to create a CLUSTERED INDEX on (id_dernier_fichier). This will mean that once the *id_dernier_fichier* has been located, you're already in the right place to get all the other fields as well.
EDIT Note About MySQL and CLUSTERED INDEXes
13.2.10.1. Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored:
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.