I am working on a database with large number of rows (6 Mil+).
This table has a composite primary key on two columns.
It also has separate index on each of those fields as there are queries that require this. Obviously, one of those indexes (indices?) is redundant and slowing down performance for write operations.
How do I find out which one is redundant? I understand the first column of a primary key is already indexed and need not be indexed separately. Is that correct? If so, is there a query I can run to find out which is the first one in the list?
SHOW INDEXES FROM tablename will include a Seq_in_index column, which tells you which is first (aka, left most) column, second column, etc.
Therefore, whichever column is listed with a value of 1 for Seq_in_index is the column that does not need it's own single column index.
You can also use SHOW CREATE TABLE tablename to see the index listed from left to right, and that order displayed correctly represents the order of columns in the index.
SHOW CREATE TABLE tablename gives you all the indexes, in their established order.
You don't need INDEX(a) because the column(s) in it are the first column(s) in the INDEX(a,b),
That applies to INDEX / UNIQUE / PRIMARY KEY in (a,b).
I understand the first column of a primary key is already indexed
Erm, no. All the columns in the primary key are indexed.
An explanation of how indexes work is stretching the scope of a post here, and the question of which indexes to put on your table is way too broad.
Suppose you have a primary key defined on attributes a,b,c. This index can be used for queries with predicates
a
a and b
a and b and c
But (at least, the last time I checked) it would not be used for a query with predicates
b
b and c
The optimizer will only ever use one index for each table in a query.
The right indexes depend on the volume of data, the cardinality of the data and the frequency and combination of predicates in your queries. There are execution and storage overheads when you start adding indexes, even just for select operations badly designed indexes can make your query slower than it would run without indexes.
Related
I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.
Let's use lastName as an example.
Assuming that there are no duplicate last names in your DB (by chance, not because of a unique), would there be any benefit to indexing this lastName column?
The query that would be used to search would be something like SELECT * IN t WHERE lastName='Smith'.
If every entry in the column is unique, then how can an index have an effect? Wouldn't it have to search every entry regardless?
Sorry, I am just learning about indexing and I would really like to understand it better.
Thanks.
Yes, there is a benefit in indexing even if the column values are unique. In the index the values are not only unique but they are also organised in a tree structure that lets you search for a row with O(log N) complexity.
There is a great article in Wikipedia about it: Database Index
...
The data is present in arbitrary order, but the logical ordering is specified
by the index. The data rows may be spread throughout the table
regardless of the value of the indexed column or expression. The
non-clustered index tree contains the index keys in sorted order, with
the leaf level of the index containing the pointer to the record (page
and the row number in the data page in page-organized engines; row
offset in file-organized engines).
In a non-clustered index
The physical order of the rows is not the same as the index order. The
indexed columns are typically non-primary key columns used in JOIN,
WHERE, and ORDER BY clauses. There can be more than one non-clustered
index on a database table.
...
Consider the following SQL statement:
SELECT first_name FROM people WHERE last_name = 'Smith';
To process this statement without an index
the database software must look at the last_name column on every row
in the table (this is known as a full table scan). With an index the
database simply follows the B-tree data structure until the Smith
entry has been found; this is much less computationally expensive than
a full table scan.
Generally speaking the more unique values there are in a column, or the higher its cardinality What is cardinality in MySQL?, the more useful an index will be on that column.
Let us consider I have a table with 60 columns , I need to perform all kind of queries on that table and need to join that table with other tables as well. And I almost using all rows for searching data in that table including other tables. This table is the like a primary table(like a primary key) in the database. So all table are in relation with this table.
By considering the above scenario can I create index on each column on the table (60 columns )
,is it good practice ?
In single sentence:
Is it best practice to create index on each column in a table ?
What might happens if I create index on each column in a table?
Where index might be "Primary key", "unique key" or "index"
Please comment, if this question is unclear for you people I will try to improve this question.
MySQL's documentation is pretty clear on this (in summary use indices on columns you will use in WHERE, JOIN, and aggregation functions).
Therefore there is nothing inherently wrong with creating an index on all columns in a table, even if it is 60 columns. The more indices there are the slower inserts and some updates will be because MySQL has to create the keys, but if you don't create the indices MySQL has to scan the entire table if only non-indexed columns are used in comparisons and joins.
I have to say that I'm astonished that you would
Have a table with 60 columns
Have all of those columns used either in a JOIN or WHERE clause without dependency on any other column in the same table
...but that's a separate issue.
It is not best practice to create index on each column in a table.
Indexes are most commonly used to improve query performance when the column is used in a where clause.
Suppose you use this query a lot:
select * from tablewith60cols where col10 = 'xx';
then it would be useful to have an index on col10.
Note that primary keys by default have an index on them, so when you join the table with other tables you should use the primary key to join.
Adding an index means that the database has to maintain it, that means that it has to be updated, so the more writes you have, the more the index will be updated.
Creating index out of the box is not a good idea, create an index only when you need it (or when you can see the need in the future... only if it is pretty obvious)
creating more index in SQL will increase only search speed while you will get slowness of insert and update and also it will take more storage.
I run the following query on my database :
SELECT e.id_dernier_fichier
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
And the query runs fine. If I modifiy the query like this :
SELECT e.codega
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
The query becomes very slow ! The problem is I want to select many columns in table e and f, and the query can take up to 1 minute ! I tried different modifications but nothing works. I have indexes on id_* also on e.codega. Enfants has 9000 lines and FichiersEnfants has 20000 lines. Any suggestions ?
Here are the info asked (sorry not having shown them from the beginning) :
The difference in performance is possibly due to e.id_dernier_fichier being in the index used for the JOIN, but e.codega not being in that index.
Without a full definition of both tables, and all of their indexes, it's not possible to tell for certain. Also, including the two EXPLAIN PLANs for the two queries would help.
For now, however, I can elaborate on a couple of things...
If an INDEX is CLUSTERED (this also applies to PRIMARY KEYs), the data is actually physically stored in the order of the INDEX. This means that knowing you want position x in the INDEX also implicity means you want position x in the TABLE.
If the INDEX is not clustered, however, the INDEX is just providing a lookup for you. Effectively saying position x in the INDEX corresponds to position y in the TABLE.
The importance here is when accessing fields not specified in the INDEX. Doing so means you have to actually go to the TABLE to get the data. In the case of a CLUSTERED INDEX, you're already there, the overhead of finding that field is pretty low. If the INDEX isn't clustered, however, you effectifvely have to JOIN the TABLE to the INDEX, then find the field you're interested in.
Note; Having a composite index on (id_dernier_fichier, codega) is very different from having one index on just (id_dernier_fichier) and a seperate index on just (codega).
In the case of your query, I don't think you need to change the code at all. But you may benefit from changing the indexes.
You mention that you want to access many fields. Putting all those fields in a composite index is porbably not the best solution. Instead you may want to create a CLUSTERED INDEX on (id_dernier_fichier). This will mean that once the *id_dernier_fichier* has been located, you're already in the right place to get all the other fields as well.
EDIT Note About MySQL and CLUSTERED INDEXes
13.2.10.1. Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored:
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
I have a table with 8 columns with mixed datatypes (int, text, varchar, etc..)
This table only has one query that operates on it (a SELECT), and within that query it uses 4 "AND" statements to filter the results.
My question is: what are some guidelines for indexing in this situation?
In my case, read times have a higher priority than write times, so should I index all the columns that appear in an AND clause? Should I only index the integers?
For just one query, make an index with all of the columns affected.
MySQL's indexes work from left to right, meaning that if you have a table with the columns:
A varchar(255)
B varchar(50)
C int(11)
D datetime
An index on KEY index (A,B,C,D) means that MySQL will look for the columns in your query, from left to right. If you include them all in your query, it will use them all but if you only include A and B, it will use these to filter down the results before applying the rest of the conditions to the remaining result set (assuming you're using AND).
However if you only include C and D, it will not use this index at all since it doesn't know what to look for in A.
If you take a quick look at how MySQL uses its indexes you should be able to work out the best columns to index.
I think you should create an index for every column, you need for your select statements. Notice, that there is a limit in mysql on a length of a field, you create an index on. The created index can not exceed 1000 bytes for MyISAM and 767 for InnoDb (thx OMG Ponies). Here you can find an discussion about assessment of the size of the index for varchars column. If you don't have writes, it should significantly boost performance of select statements.