What is the difference between single or composite column indexes? [duplicate] - mysql

This question already has answers here:
When should I use a composite index?
(9 answers)
Closed 7 years ago.
In any relational Databases, we can create indexes that boost query speed. But creating more index can damage update/insert speed because the Db system will have to update each index when new data coming (insert, update, merge etc)
We use an example.
we can create a index called index1
ADD INDEX index1 (order_id ASC, buyer_id ASC)
OR we can create 2 indexes, index2 and index3
ADD INDEX index2 (order_id ASC)
ADD INDEX index3 (buyer_id ASC)
In a query like this
select * from tablename where order_id>100 and buyer_id>100
Which one is faster? By using Index1 or index2 and index3?
On the other side of the equation, when inserting or updating, I assume it will be much faster to just use one index instead of 2 but I haven't tested it against MySql or MSSQL server so I can't be so sure. If anyone has experience on that matter, do share it please.
And the last thing is about int typed values, I thought it's not possible or relevant to create a index just for int type columns because it doesn't boost the query time, is it true?

The performance of an index are linked to its selectivity, the fact of using two indexes, or a composite index must be assessed, in the context of its application or query particularly critical as regards the performance just by virtue of where on the fields as a possible the index reduces the number of rows to process (and put into joins).
In your case, since an order usually only one buyer is not very selective index order_id, buyer_id (pleasant its usefulness to join operations) as it would be more the reverse, buyer_id, order_id for facilitating the search for orders of a buyer

For the exact query you mentioned I would personally go for index1 (you will have a seek operation for both conditions at once). The same index should also do the job even if you filter by order_id only (because order id is the first column of the index, so the same BTREE structure should still help even if you omit the buyer).
At the same time index1 would not help much if you filter by buyer_id only (because the BTREE will be structured firstly by the missing order_id as per the index creation statement). You will probably end up with index scan with index1, while having separate indices would still work in that scenario (a seek on index3 is what should be expected).

Related

Avoiding mysql filesort

I have a client running a php photo gallery (on php 5.5, mysql 5.5, using myisam tables) that uses the directory tree method. Unfortunately, some of the queries in their gallery application is demanding horribly long filesorts. The offending query:
SELECT `name`, `slug`
FROM `db_table`
WHERE `left_ptr` <= '914731'
AND `right_ptr` >= '914734'
AND `id` <> 1
ORDER BY `left_ptr` ASC
There are indexes on id, left_ptr and right_ptr, but according to the EXPLAIN, none of them are being used in the query.
I heard that creating a composite index (on the 'condition' columns) would make things faster, but does that apply to this case? The last condition is really but an 'anything but 1' clause, so would a composite index apply to that, too? Thanks for any insight into this.
Yes, a composite index on (left_ptr, right_ptr) should make this query run better.
MySQL will only use one index per query. It's likely not using any single index because it's determined no single index would be much faster than a full table scan. For example, id <> 1 is every row but the first, so just do a full table scan. The other two filters depend on how the data is distributed, but if it doesn't filter a significant portion of the table it won't use an index.
A composite index on (left_ptr, right_ptr) should make this query run better. Don't bother with id, as above id <> 1 only filters one row.
MySQL can use the first column of a composite index alone, so this composite index also replaces the one on left_ptr alone

One composite index or many indexes for foreign keys?

Whats is the difference between creating a covering index for all the foreign keys of a relation table and creating one index for each column (foreign key) of the relation table ?
For instance, I have the table sales(p_id, e_id, c_id, ammount) where p_id is a foreign key (products table), e_id is a foreign key (employee table) and c_id a foreign key (customer_table). The primary key of the table is {p_id, e_id, c_id}.
Which on is better ?
CREATE INDEX cmpindex ON sales(p_id, e_id, c_id)
OR
CREATE INDEX pindex on sales(p_id)
CREATE INDEX eindex on sales(e_id)
CREATE INDEX cindex on sales(c_id)
I mostly run queries with joins on the relation table and the parent tables.
Which one is better depends on your actual queries.
One thing to understand is that when you join the table sales once in your query, it will only use one index for it (at the most). So you need to make sure an index is available that is most appropriate for the query.
If you join the sales table always to all three other tables (customer, product and employee) then a composite index is to be preferred, assuming that the engine will actually use it and not perform a table scan.
The order of the fields in the composite index is important when it comes to the order of the results. For instance, if your query is going to group the results by product (first), and then order the details per customer, you could benefit from an index that has the product id first, and the customer id as second.
But it may also be that the engine decides that it is better to start scanning the table sales first and then join in the other three tables using their respective primary key indexes. In that case no index is used that exists on the sales table.
The only way to find out is to get the execution plan of your query and see which indexes will be used when they are defined.
If you only have one query on the sales table, there is no need to have several indexes. But more likely you have several queries which output completely different results, with different field selections, filters, groupings, ...Etc.
In that case you may need several indexes, some of which will serve for one type of query, and others for others. Note that what you propose is not mutually exclusive. You could maybe benefit from several composite indexes, which just have a different order of fields.
Obviously, a multitude of indexes will slow down data changes in those tables, so you need to consider that trade-off as well.
Note that an index on a compound key will only be used if you query on the first portion, the first and second portion, the first, second and third portion, etc., so querying on p_id or p_id and e_id, etc. or even e_id and p_id will utilize the index. Indeed, any query containing p_id will use this index.
However, if you query your Sales table on e_id or c-id or any combination of these two, cmpindex will not be used and a full table scan will be performed.
One benefit of having an index on each foreign key (a non-unique index, as there could be multiple sales of the same product, or by the same employee, or to the same customer, leading to duplicate entries in the index) is that the query optimizer has the option of using the index to reduce the number of rows returned, and then doing a sequential search through the result set.
E.g. if the query is a search on sales of a particular product to a particular customer (regardless of employee) and you have a million sales, the foreign key index cindex could be used to return 20 sales items to that particular customer, and that result set could be very efficiently searched sequentially to find which of these sales were for a particular product.
If the search was performed on Product and pindex was used, the result set may be 10,000 rows (all sales of that product), which would have to be sequentially searched to find the sales of that product to a particular customer, leading to a very inefficient query.
I believe that the statistics kept for a table (used by the optimizer) keep track of the average number of rows that will be returned for a query using each index, so the optimizer will be able to work out that cindex should be used rather than pindex in the examples above. Alternatively, you can give hints on your queries to specify that a particular index be used.
It is, obviously, important to run UPDATE STATISTICS on a regular basis, as the execution plan would use pindex in the example above if there were, on average, only 10 sales of each product.
If your queries(search) propagates through sales for each of tables independently then you must create a separate index for each.
If that's not necessary then you can go for composite.
As HoneyBadger commented, you already have a composite index, since your primary key is itself an index.
In general, you should use a single index for each column whenever you think you will have queries involving each field by itself.
As stated here, when you have a composite index, it can work with queries involving all fields, or with queries involving the first field (in order), the first and the second, or the first,second and third together. It won't be used in queries involving only the second and third field.
The other answers are missing an important point. When you declare a foreign key in MySQL, it creates an index on the column. This is not (necessarily) true in other databases, but it is true in MySQL.
So, the declaration automatically creates these indexes:
CREATE INDEX pindex on sales(p_id);
CREATE INDEX eindex on sales(e_id);
CREATE INDEX cindex on sales(c_id);
(These indexes are very handy for dealing with cascading constraints and maintaining the data integrity based on the foreign key.)
If you happen to have also declared an index on sales(p_id, e_id, c_id, amount), then the first of the indexes is not needed -- it is a subset of this index. However, the other two are needed.
Is this index needed? As mentioned in other questions, that depends on the queries you want to use the index for. I recommend starting with the documentation on this subject to understand how the indexes get used.

MySQL covering index optimization? [duplicate]

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/

Index on mysql partitioned tables

I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.

Mysql query tuning (large data set) and explain plan

I am using mysql5.1, i have table which has about 15 lakh (1.5 million) records.This table has records for different entities i.e child records for all master entities.
There are 8 columns in this table , out of which 6 columns are clubbed to make a primary key.
These columns could be individual foreign keys but due to performance we have made this change.
Even a simple select statement with two conditions is taking 6-8 seconds.Below is the explain plan for the same.
Query
explain extended
select distinct location_code, Max(trial_number) as replication
from status_trait t
where t.status_id='N02'
and t.trial_data='orange'
group by location_code
The results of EXPLAIN EXTENDED
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index FK_HYBRID_EXP_TRAIT_DTL_2 5 1481572 100.00 Using where; Using index
I have these questions:
How to handle tables with large data
Is indexing fine for this table
Two things might help you here.
First, SELECT DISTINCT is pointless in an aggregating query. Just use SELECT.
Second, you didn't disclose the indexes you have created. However, to satisfy this query efficiently, the following compound covering index will probably help a great deal.
(status_id, trial_data, location_code, trial_number)
Why is this the right index? Because MySQL indexes are organized as BTREE. This organization allows the server to random-access the index to find particular values. In your case you want particular values of status_id and trial_data. Once the server has random-accessed the index, it can then scan sequentially. In this case you hope to scan for various values of location_code. The server knows it will find those different values already in order. Finally, the server needs to pluck out values of trial_number to use in your MAX() function. Lo and behold, there they are in the index ready for the plucking.
(If you're doing a lot of aggregation and querying of large tables, it makes sense for you to learn how compound and covering indexes work.)
There's a cost to adding an index: when you INSERT or UPDATE rows, you have to update your index as well. But this kind of index will greatly accelerate your retrieval.