MySQL 5.0 indexes - Unique vs Non Unique - mysql

What is the difference between MySQL unique and non-unique index in terms of performance?
Let us say I want to make an index on a combo of 2 columns, and the combination is unique, but I create a non-unique index. Will that have any significant effect on the performance or the memory MySQL uses?
Same question, is there is difference between primary key and unique index?

UNIQUE and PRIMARY KEY are constraints, not indexes. Though most databases implement these constraints by using an index. The additional overhead of the constraint in addition to the index is insignificant, especially when you count the cost of tracking down and correcting unintentional duplicates when (not if) they occur.
Indexes are usually more effective if there you have a high selectivity. This is the ratio of number of distinct values to the total number of rows.
For example, in a column for Social Security Number, you may have 1 million rows with 1 million distinct values. So the selectivity is 1000000/1000000 = 1.0 (although there are rare historical exceptions, SSN's are intended to be unique).
But another column in that table, "gender" may only have two distinct values over 1 million rows. 2/1000000 = very low selectivity.
An index with a UNIQUE or PRIMARY KEY constraint is guaranteed to have a selectivity of 1.0, so it will always be as effective as an index can be.
You asked about the difference between a primary key and a unique constraint. Chiefly, it's that you can have only one primary key constraint per table (even if that constraint's definition includes multiple columns), whereas you can have multiple unique constraints. A column with a unique constraint may permit NULLs, whereas columns in primary key constraints must not permit NULLs. Otherwise, primary key and unique are very similar in their implementation and their use.
You asked in a comment about whether to use MyISAM or InnoDB. In MySQL, they use the term storage engine. There are bunch of subtle differences between these two storage engines, but the chief ones are:
InnoDB supports transactions, so you can choose to roll back or commit changes. MyISAM is effectively always autocommit.
InnoDB enforces foreign key constraints. MyISAM doesn't enforce or even store foreign key constraints.
If these features are things you need in your application, then you should use InnoDB.
To respond to your comment, it's not that simple. InnoDB is actually faster than MyISAM in quite a few cases, so it depends on what your application's mix of selects, updates, concurrent queries, indexes, buffer configuration, etc.
See http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/ for a very thorough performance comparison of the storage engines. InnoDB wins over MyISAM frequently enough that it's clearly not possible to say one is faster than the other.
As with most performance-related questions, the only way to answer it for your application is to test both configurations using your application and a representative sample of data, and measure the results.

On a non-unique index that just happens to be unique and a unique index? I'm not sure, but I'd guess not a lot. The optimiser should examine the cardinality of the index and use that (it will always be the number of rows, for a unique index).
As far as a primary key is concerned, probably quite a lot, but it depends which engine you use.
The InnoDB engine (which is used by many people) always clusters rows on the primary key. This means that the PK is essentially combined with the actual row data. If you're doing a lot of lookups by PK (or indeed, range scans etc), this is a Good Thing, because it means that it won't need to fetch as many blocks from the disc.
A non-PK unique index will never be clustered in InnoDB.
On the other hand, some other engines (MyISAM in particular) don't cluster the PK, so the primary key is just like a normal unique index.

Related

Cluster or unique index MySql DB?

I have one interesting question: what is difference between cluster index and unique index? What's better and faster and why?
With MySQL's InnoDB engine, you get 3 choices of ordinary indexes:
PRIMARY KEY -- "clustered" with the data and "unique". (Often AUTO_INCREMENT)
UNIQUE -- unique, not clustered.
INDEX -- non unique, not clustered.
All are implemented as BTrees. Clustered means that the data is in the leaf node.
Considerations:
Any of the 3 choices can help with performance of searching for row(s).* InnoDB needs a PRIMARY KEY.
If you want the database to spit at you when you are trying to insert something that is already there, you need to specify the column(s) of the PRIMARY KEY or a UNIQUE.
Secondary keys (UNIQUE or INDEX) go through the PRIMARY KEY to find the data. Corollary 1: Finding a row by the PK is faster. Corollary 2: A bulky (eg, VARCHAR(255)) PK is an extra burden on every secondary key.
A "covering" secondary key does not need to go beyond its BTree.
Data and indexes are cached in RAM at the 16KB block unit; caching may be an important consideration in performance.
UUID/GUID indexes are terrible when the index cannot be fully cached -- due to high I/O.
An INSERT must immediately check the PRIMARY KEY and any UNIQUE key(s) for duplicate, but it can delay updating other secondary keys. (This may have impact on performance during inserts.)
From those details, it is sometimes possible to deduce the answers to your questions.
(Caveat: Engines other than InnoDB have some different characteristics.)

Is automatically indexing primary key really good?

In some DBMS like MySQL the primary key is always indexed by default. I know indexing can make operations like selection and comparison of the indexed column much faster, but it can also slow down other operations like insertion and update. There are cases when there are few selections on the primary key of a table, in which indexing will not bring much benefit. In such cases wouldn't it better not indexing the primary key?
Clarification: I just learned that primary key is actually implemented by a special index, like clustered index for InnoDB. Index can definitely be used to enforce the uniqueness constraint of primary key, but is it really necessary to use index to do this? From what I know, index is often implemented as btree which can improve the performance of many more operations than just checking the uniqueness, which can be simply done by a hashtable. So why not use other simpler structures to enforce the uniqueness that have less negative impact on the performance of insert and update operations?
The article here mentions a similar point:
Unique indexes use as much space as nonunique indexes do. The value of
every column as well as the record's location is stored. This can be a
waste if you use the unique index as a constraint and never as an
index. Put another way, you may rely on the unique index to enforce
uniqueness but never write a query that uses the unique value. In this
case, there's no need for MySQL to store the locations of every record
in the index: you'll never use them.
And in the following paragraph:
Unfortunately, there's no way to signal your intentions to MySQL. In
the future, we'll likely find a feature introduced for this specific
case. The MyISAM storage engine already has support for unique columns
without an index (it uses a hash-based system), but the mechanism
isn't exposed at the SQL level yet.
The "hash-based system" is an example of what I meant by "other simpler structures".
A primary key that isn't indexed is neither primary nor even a key.
Your question doesn't make sense.
Let's go back in history about 20 years when MySQL was just getting started. The inventor said to himself, "what indexing system is simple and efficient and generally useful". The answer was BTree. So, BTrees are all that existed for a long time in MySQL. Then he asked himself "what bells and whistles should we put on the PRIMARY KEY". The answer was KISS -- make identical to other UNIQUE indexes. This was the MyISAM engine.
Later (about 15 years ago) another inventor joined forces. He brought 'simple', yet transactional, InnoDB engine. Since transactions really need a PK, InnoDB has a PK that is UNIQUE and clustered. And, again, the data+PK is a BTree.
Every so often someone would ask "Do we need bitmap indexes, hash indexes, a second clustered index, etc." The answer always came back, "No, BTree is good enough." A few non-MySQL engines have been invented to do non-BTree indexes. Perhaps the most successful is Tokutek and its "Fractal index". MariaDB now includes TokuDB. Another is the "columnar indexing" of Infinidb.
(Apologies to Monty and Heikki if they did not actually ask those questions.)
Hash and BTree indexes are about equally fast for "point queries". But for "range queries", Hash is useless and BTree is excellent. Why implement both when one is clearly better?

Decision to use KEY or UNIQUE KEY

I understand that UNIQUE KEY is a unique index and KEY is a non-unique index. I have read that in case of unique index'es inserting data might result in some IO.
If we don't have to rely on the DB for unique-ness and we still want fast lookup's using column 'b' would you suggest to use a non unique index (KEY) instead of a unique index (UNIQUE KEY)?
Both unique and non-unique indexes result in I/O operations for INSERT, DELETE, and UPDATE statements. The amount of index overhead should be pretty much the same. The difference is that unique indexes might result in a failure of an INSERT or UPDATE under normal use (of course, the operations might fail for other reasons, such as disk being full, but that is an unusual circumstance).
I don't understand this statement: "If we don't have to rely on the DB for unique-ness". A UNIQUE attribute in a table is a description of the column/columns that comprise the key. One of the functions of a database is to maintain the integrity of the data, so let the database do what it is designed for.
As for performance, I don't think there is a significant difference between unique and non-unique indexes. Unique indexes may be slightly more optimized for certain operations because the compiler knows that a single lookup returns only one row. The difference between an index lookup and an index scan that returns one row is probably pretty small in practice.

Using Primary Keys as Index

In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?
Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.
Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.
Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.
1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.
Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values ​​among more than a thousand it is a good sign to make it index.

In InnoDB, are columns which are not part of index stored in sorted order as well?

I am using InnoDB. My Index selectivity (cardinality / total-rows) is < 100%, roughly 96-98%.
I would like to know if the columns, which are not part of the keys, are also stored in sorted order. This influences my tables' design.
Would also be interest to understand how much performance degradation in lookup I can expect when index selectivity is < 100%.
(I get these question since for InnoDB it's only mentioned that indexes are clustered and there's TID/RP stored after the index)
No, it doesn't matter for the order of the non-keyed columns.
The answer to your second is more complex - I'm going to walk through it since I think you might be misunderstanding InnoDB a little -
There are two types of indexes, primary and secondary.
The primary key index is clustered - that is, data is stored in the leaves of the B+Tree. Looking up by primary key is just one tree traversal, and you've got the row(s) you're looking for.
Looking up by secondary key requires searching through the secondary key, finding the primary key rows that match, and then looking through the primary key to get the data.
You only need to worry about selectivity of secondary (non clustered) indexes, since the primary (clustered) index will always have a selectivity of 1. How selective a secondary index needs to be varies a lot - for one; it depends on the width of the index versus the width of the row. It also depends on if you have memory fit, since if secondary keys don't "follow" the primary key, it may cause random IO to look up each of the rows from the clustered index.