I understand that UNIQUE KEY is a unique index and KEY is a non-unique index. I have read that in case of unique index'es inserting data might result in some IO.
If we don't have to rely on the DB for unique-ness and we still want fast lookup's using column 'b' would you suggest to use a non unique index (KEY) instead of a unique index (UNIQUE KEY)?
Both unique and non-unique indexes result in I/O operations for INSERT, DELETE, and UPDATE statements. The amount of index overhead should be pretty much the same. The difference is that unique indexes might result in a failure of an INSERT or UPDATE under normal use (of course, the operations might fail for other reasons, such as disk being full, but that is an unusual circumstance).
I don't understand this statement: "If we don't have to rely on the DB for unique-ness". A UNIQUE attribute in a table is a description of the column/columns that comprise the key. One of the functions of a database is to maintain the integrity of the data, so let the database do what it is designed for.
As for performance, I don't think there is a significant difference between unique and non-unique indexes. Unique indexes may be slightly more optimized for certain operations because the compiler knows that a single lookup returns only one row. The difference between an index lookup and an index scan that returns one row is probably pretty small in practice.
Related
This describes different indexes:
KEY or INDEX refers to a normal non-unique index. Non-distinct values
for the index are allowed, so the index may contain rows with
identical values in all columns of the index. These indexes don't
enforce any restraints on your data so they are used only for making
sure certain queries can run quickly.
UNIQUE refers to an index where all rows of the index must be unique.
That is, the same row may not have identical non-NULL values for all
columns in this index as another row. As well as being used to speed
up queries, UNIQUE indexes can be used to enforce restraints on data,
because the database system does not allow this distinct values rule
to be broken when inserting or updating data.
I understand the benefit to application logic (you don't want uniqueness check) but is there also a performance improvement? Specifically, how much faster are writes using INDEX instead of UNIQUE?
UNIQUE KEY is a constraint, and you use it when you want to enforce that constraint.
KEY is an index, which you pick to make certain queries more efficient.
The performance of inserting into a table with either type of index is virtually the same. That is, the difference, if any, is so minor that it's not worth picking one over the other for the sake of performance.
Choose the type of index to support your constraints. Use UNIQUE KEY if and only if you want to enforce uniqueness. Use KEY otherwise.
Your question is like asking, "which is faster, a motorcycle or a speedboat?" They are used in different situations, so judging them on their speed isn't the point.
INSERT
When a row is inserted, all unique keys (PRIMARY and UNIQUE) are immediately checked for duplicate keys. This is so that you get an error on the INSERT if necessary. The updating of non-unique INDEXes is delayed (for discussion, see "Change buffering"). The work will be done in the background so your INSERT won't be waiting for it.
So, there is a slight overhead in UNIQUE for inserting. But, as already pointed out, if you need the uniqueness constraint, then use it.
SELECT
Any kind of index (PRIMARY, UNIQUE, or INDEX) may be used to speed up a SELECT. Mostly, the types of index work identically. However with PRIMARY and UNIQUE, the optimizer can know that there will only one (or possibly zero) rows matching a given value, so it can fetch the one row, then quit. For a non-unique index, there could be more than one row, so it keeps scanning the index, checking for more rows. This scan stops after peeking at the first non-matching row. So, there is a small (very small) overhead for non-unique indexes versus unique.
Bottom Line
The performance issues are less important than the semantics (uniqueness constraint vs. not).
I am going to use unique keys for my messages table which will be conbination of three columns fromid, localid and timestamp
I want to ask a question that is the message insertion will get slow or it will perform well?
Compared to what?
Having a primary key or unique index is always going to affect message insertion time, because column values need to be compared to the values in the index. In most environments, the unique index can fit in memory, so this is just a few comparison operations and an insert -- nothing to worry about and much less than either network overhead or disk i/o.
If you have a really large table compared to available memory, then the operations could start to take more time.
If your application needs to enforce this unique index, then you should use it. You probably won't notice the additional overhead for enforcing uniqueness, unless you do very intense performance tests.
In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?
Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.
Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.
Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.
1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.
Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values among more than a thousand it is a good sign to make it index.
what is the performance characteristic for Unique Indexes in Mysql and Indexes in general in MySQl (like the Primary Key Index):
Given i will insert or update a record in my databse: Will the speed of updating the record (=building/updating the indexes) be different if the table has 10 Thousand records compared to 100 Million records. Or said differently, does the Index buildingtime after changing one row depend on the total indexsize?
Does this also apply for any other indexes in Mysql like the Primary Key index?
Thank you very much
Tom
Most indexes in MySQL are really the same internally -- they're B-tree data structures. As such, updating a B-tree index is an O(log n) operation. So it does cost more as the number of entries in the index increases, but not badly.
In general, the benefit you get from an index far outweighs the cost of updating it.
A typical MySQL implementation of an index is as a set of sorted values (not sure if any storage engine uses different strategies, but I believe this holds for the popular ones) -- therefore, updating the index inevitably takes longer as it grows. However, the slow-down need not be all that bad -- locating a key in a sorted index of N keys is O(log N), and it's possible (though not trivial) to make the update O(1) (at least in the amortized sense) after the finding. So, if you square the number of records as in your example, and you pick a storage engine with highly optimized implementation, you could reasonably hope for the index update to take only twice as long on the big table as it did on the small table.
Note that if new primary key values are always larger than the previous (i.e. autoincrement integer field), your index will not need to be rebuilt.
What is the difference between MySQL unique and non-unique index in terms of performance?
Let us say I want to make an index on a combo of 2 columns, and the combination is unique, but I create a non-unique index. Will that have any significant effect on the performance or the memory MySQL uses?
Same question, is there is difference between primary key and unique index?
UNIQUE and PRIMARY KEY are constraints, not indexes. Though most databases implement these constraints by using an index. The additional overhead of the constraint in addition to the index is insignificant, especially when you count the cost of tracking down and correcting unintentional duplicates when (not if) they occur.
Indexes are usually more effective if there you have a high selectivity. This is the ratio of number of distinct values to the total number of rows.
For example, in a column for Social Security Number, you may have 1 million rows with 1 million distinct values. So the selectivity is 1000000/1000000 = 1.0 (although there are rare historical exceptions, SSN's are intended to be unique).
But another column in that table, "gender" may only have two distinct values over 1 million rows. 2/1000000 = very low selectivity.
An index with a UNIQUE or PRIMARY KEY constraint is guaranteed to have a selectivity of 1.0, so it will always be as effective as an index can be.
You asked about the difference between a primary key and a unique constraint. Chiefly, it's that you can have only one primary key constraint per table (even if that constraint's definition includes multiple columns), whereas you can have multiple unique constraints. A column with a unique constraint may permit NULLs, whereas columns in primary key constraints must not permit NULLs. Otherwise, primary key and unique are very similar in their implementation and their use.
You asked in a comment about whether to use MyISAM or InnoDB. In MySQL, they use the term storage engine. There are bunch of subtle differences between these two storage engines, but the chief ones are:
InnoDB supports transactions, so you can choose to roll back or commit changes. MyISAM is effectively always autocommit.
InnoDB enforces foreign key constraints. MyISAM doesn't enforce or even store foreign key constraints.
If these features are things you need in your application, then you should use InnoDB.
To respond to your comment, it's not that simple. InnoDB is actually faster than MyISAM in quite a few cases, so it depends on what your application's mix of selects, updates, concurrent queries, indexes, buffer configuration, etc.
See http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/ for a very thorough performance comparison of the storage engines. InnoDB wins over MyISAM frequently enough that it's clearly not possible to say one is faster than the other.
As with most performance-related questions, the only way to answer it for your application is to test both configurations using your application and a representative sample of data, and measure the results.
On a non-unique index that just happens to be unique and a unique index? I'm not sure, but I'd guess not a lot. The optimiser should examine the cardinality of the index and use that (it will always be the number of rows, for a unique index).
As far as a primary key is concerned, probably quite a lot, but it depends which engine you use.
The InnoDB engine (which is used by many people) always clusters rows on the primary key. This means that the PK is essentially combined with the actual row data. If you're doing a lot of lookups by PK (or indeed, range scans etc), this is a Good Thing, because it means that it won't need to fetch as many blocks from the disc.
A non-PK unique index will never be clustered in InnoDB.
On the other hand, some other engines (MyISAM in particular) don't cluster the PK, so the primary key is just like a normal unique index.