Is automatically indexing primary key really good? - mysql

In some DBMS like MySQL the primary key is always indexed by default. I know indexing can make operations like selection and comparison of the indexed column much faster, but it can also slow down other operations like insertion and update. There are cases when there are few selections on the primary key of a table, in which indexing will not bring much benefit. In such cases wouldn't it better not indexing the primary key?
Clarification: I just learned that primary key is actually implemented by a special index, like clustered index for InnoDB. Index can definitely be used to enforce the uniqueness constraint of primary key, but is it really necessary to use index to do this? From what I know, index is often implemented as btree which can improve the performance of many more operations than just checking the uniqueness, which can be simply done by a hashtable. So why not use other simpler structures to enforce the uniqueness that have less negative impact on the performance of insert and update operations?
The article here mentions a similar point:
Unique indexes use as much space as nonunique indexes do. The value of
every column as well as the record's location is stored. This can be a
waste if you use the unique index as a constraint and never as an
index. Put another way, you may rely on the unique index to enforce
uniqueness but never write a query that uses the unique value. In this
case, there's no need for MySQL to store the locations of every record
in the index: you'll never use them.
And in the following paragraph:
Unfortunately, there's no way to signal your intentions to MySQL. In
the future, we'll likely find a feature introduced for this specific
case. The MyISAM storage engine already has support for unique columns
without an index (it uses a hash-based system), but the mechanism
isn't exposed at the SQL level yet.
The "hash-based system" is an example of what I meant by "other simpler structures".

A primary key that isn't indexed is neither primary nor even a key.
Your question doesn't make sense.

Let's go back in history about 20 years when MySQL was just getting started. The inventor said to himself, "what indexing system is simple and efficient and generally useful". The answer was BTree. So, BTrees are all that existed for a long time in MySQL. Then he asked himself "what bells and whistles should we put on the PRIMARY KEY". The answer was KISS -- make identical to other UNIQUE indexes. This was the MyISAM engine.
Later (about 15 years ago) another inventor joined forces. He brought 'simple', yet transactional, InnoDB engine. Since transactions really need a PK, InnoDB has a PK that is UNIQUE and clustered. And, again, the data+PK is a BTree.
Every so often someone would ask "Do we need bitmap indexes, hash indexes, a second clustered index, etc." The answer always came back, "No, BTree is good enough." A few non-MySQL engines have been invented to do non-BTree indexes. Perhaps the most successful is Tokutek and its "Fractal index". MariaDB now includes TokuDB. Another is the "columnar indexing" of Infinidb.
(Apologies to Monty and Heikki if they did not actually ask those questions.)
Hash and BTree indexes are about equally fast for "point queries". But for "range queries", Hash is useless and BTree is excellent. Why implement both when one is clearly better?

Related

Why does InnoDB require clustered index upon creating a table?

Even if I don't have a primary key or unique key, InnoDB still creates a cluster index on a synthetic column as described below.
https://dev.mysql.com/doc/refman/5.5/en/innodb-index-types.html
So, why does InnoDB have to require clustered index? Is there a defenite reason clustered index must exist here?
In Oracle Database or MSSQL I don't see they require this.
Also, I don't think cluster index have so tremendous advantage comparing to ordinary table either.
It is true that looking for data using clustering key does not need an additional disk read and faster than when I don't have one but without cluster index, secondary index can look up faster by using physical rowID.
Therefore, I don't see any reason for insisting using it.
Other vendors have a "ROWNUM" or something like that. InnoDB is much simpler. Instead of having that animal, it simply requires something that you will usually want anyway. In both cases, it is a value that uniquely identifies a row. This is needed for guts of transactions -- knowing which row(s) to lock, etc, to provide transactional integrity. (I won't go into the rationale here.)
In requiring (or providing) a PK, and in doing certain other simplifications, InnoDB sacrifices several little-used (or easily worked around) features: Multiple pks, multiple clustered indexes, no pk, etc.
Since the "synthetic column" takes 6 bytes, it is almost always better to simply provide id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, even if you don't use it. But if you don't use it, but do have a non-NULL UNIQUE key, then you may as well make it the PK. (As MySQL does by default.)
A lookup by a secondary key first gets the PK value from the secondary key's BTree. Then the main BTree (with the data ordered by the PK) is drilled down to find the row. Hence, secondary keys can be slower that use of the PK. (Usually this is not enough slower to matter.) So, this points out one design decision that required a PK.) (Other vendors use ROWNUM, or something, to locate the record, instead of the PK.)
Back to "Why?". There are many decisions in MySQL where the designers said "simplicity is better for this free product, let's not bother building some complex, but little-used feature. At first there were no subqueries (temp tables were a workaround). No Views (they are only syntactic sugar). No Materialized Views (OK, this may be a failing; but they can be simulated). No bit-mapped or hash or isam (etc) indexing (BTree is very good for "all-around" usage).
Also, by always "clustering" the PK with the data, lookups via the PK are inherently faster than the competition (no going through a ROWNUM). (Secondary key lookups may not be faster.)
Another difference -- MySQL was very late in implementing "index merge", wherein it uses two indexes, then ANDs or ORs the results. This can be efficient with ROWNUMs, but not with clustered PKs.
(I'm not a MySQL/MariaDB/Percona developer, but I have used them since 1999, and have been to virtually all major MySQL Conferences, where inside info is often divulged. So, I think I have enough insight into their thinking to present this answer.)

Using Primary Keys as Index

In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?
Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.
Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.
Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.
1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.
Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values ​​among more than a thousand it is a good sign to make it index.

What is the use of Mysql Index key?

Hi I am a newbie to mysql
Here are my questions:
What is the use of Mysql Index key?
Does it make a difference in mysql queries with defining an Index key and without it?
Are all primary keys default Index key?
Thanks a million
1- Defining an index on a column (or set of columns) makes searching on that column (or set) much faster, at the expense of additional disk space.
2- Yes, the difference is that queries using that column will be much faster.
3- Yes, as it's usual to search by the primary key, it makes sense for that column to always be indexed.
Read more on MySQL indexing here.
An index is indeed an additional set of records. Nothing more.
Things that make indexes access faster are:
Internally there's more chance that the engine put in buffer the index than the whole table rows
The index is smaller so to parse it means reading less blocks of the hard drive
The index is sorted already, so finding a given value is easy
In case of being not null, it's even faster (for various reasons, but the most important thing to know is that the engine doesn't store null values in indexes)
To know whether or not an index is useful is not so easy to guess (obviously I'm not talking about the primary key) and should be investigated. Here are some counterparts when it might slow down your operations:
It will slow down inserts and updates on indexed fields
It requires more maintenance: statistics have to be built for each index so the computing could take a significantly longer time if you add many indexes
It might slow down the queries when the statistics are not up to date. This effect could be catastrophic because the engine would actually go "the wrong way"
It might slow down when the query are inadequate (anyway indexes should not be a rule but an exception: no index, except if there's an urge on certain queries. I know usually every table has at least one index, but it came after investigations)
We could comment this last point a lot, but I think every case is special and many examples of this already exist in internet.
Now about your question 'Are all primary keys default Index key?', I must say that it is not like this that the optimizer works. When there are various indexes defined in a table, the more efficient index combination will be compiled with on the fly datas and some other static datas (indexes statistics), in order to reach the best performances. There's no default index per se, every situation leeds to a different result.
Regards.

Are Multi-column Primary Keys in MySQL a optimisation problem?

Been looking into using multi-column primary keys and as performance is extremely important with the size of traffic and database I need to know if there is anything to consider before I start throwing out the unique ID method on many of my tables and start using mulit column primary keys.
So, what are the performance/optimisation pros/cons to using multi column primary keys versus a basic single column, auto-inc primary key?
Is there a particular reason that you need/want to use multi-column keys instead of an (I assume) already created single-column key?
One of the problems with Natural Keys is dealing with cascading an update to the key value across all the foreign keys. A surrogate key such as an auto-increment column avoids this.
In terms of performance, depending on the row count, the data types of the columns, your storage engine, and the amount of RAM you have dedicated to MySQL, multi-column keys can affect performance due to the sheer size of the index.
In my experience, it is almost always easier in terms of development and maintenance to use a surrogate key as the PK and then create indexes that cover your queries across the natural keys. However, the only way to determine the true performance impact for your application is to benchmark it with realistic a realistic load and dataset.
HTH -
Chris
I wouldn't think that there would be any performance problems with multiple primary keys. It's more or less equivalent to having multiple indexes (you will spend a little bit more time computing index values when doing inserts).
Sometimes the data model makes more sense with multiple keys. I'd worry about being straightforward first and worry about performance second. You can always add more indexes, improve your queries, or twiddle server settings.
I think the most I've encountered was a 4-column primary key. Makes me cringe a little bit, but it worked¹.
[1] "worked" is defined to mean "the application performed to specification", and is not meant to imply that actual tasks were accomplished using said application. :)

MySQL 5.0 indexes - Unique vs Non Unique

What is the difference between MySQL unique and non-unique index in terms of performance?
Let us say I want to make an index on a combo of 2 columns, and the combination is unique, but I create a non-unique index. Will that have any significant effect on the performance or the memory MySQL uses?
Same question, is there is difference between primary key and unique index?
UNIQUE and PRIMARY KEY are constraints, not indexes. Though most databases implement these constraints by using an index. The additional overhead of the constraint in addition to the index is insignificant, especially when you count the cost of tracking down and correcting unintentional duplicates when (not if) they occur.
Indexes are usually more effective if there you have a high selectivity. This is the ratio of number of distinct values to the total number of rows.
For example, in a column for Social Security Number, you may have 1 million rows with 1 million distinct values. So the selectivity is 1000000/1000000 = 1.0 (although there are rare historical exceptions, SSN's are intended to be unique).
But another column in that table, "gender" may only have two distinct values over 1 million rows. 2/1000000 = very low selectivity.
An index with a UNIQUE or PRIMARY KEY constraint is guaranteed to have a selectivity of 1.0, so it will always be as effective as an index can be.
You asked about the difference between a primary key and a unique constraint. Chiefly, it's that you can have only one primary key constraint per table (even if that constraint's definition includes multiple columns), whereas you can have multiple unique constraints. A column with a unique constraint may permit NULLs, whereas columns in primary key constraints must not permit NULLs. Otherwise, primary key and unique are very similar in their implementation and their use.
You asked in a comment about whether to use MyISAM or InnoDB. In MySQL, they use the term storage engine. There are bunch of subtle differences between these two storage engines, but the chief ones are:
InnoDB supports transactions, so you can choose to roll back or commit changes. MyISAM is effectively always autocommit.
InnoDB enforces foreign key constraints. MyISAM doesn't enforce or even store foreign key constraints.
If these features are things you need in your application, then you should use InnoDB.
To respond to your comment, it's not that simple. InnoDB is actually faster than MyISAM in quite a few cases, so it depends on what your application's mix of selects, updates, concurrent queries, indexes, buffer configuration, etc.
See http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/ for a very thorough performance comparison of the storage engines. InnoDB wins over MyISAM frequently enough that it's clearly not possible to say one is faster than the other.
As with most performance-related questions, the only way to answer it for your application is to test both configurations using your application and a representative sample of data, and measure the results.
On a non-unique index that just happens to be unique and a unique index? I'm not sure, but I'd guess not a lot. The optimiser should examine the cardinality of the index and use that (it will always be the number of rows, for a unique index).
As far as a primary key is concerned, probably quite a lot, but it depends which engine you use.
The InnoDB engine (which is used by many people) always clusters rows on the primary key. This means that the PK is essentially combined with the actual row data. If you're doing a lot of lookups by PK (or indeed, range scans etc), this is a Good Thing, because it means that it won't need to fetch as many blocks from the disc.
A non-PK unique index will never be clustered in InnoDB.
On the other hand, some other engines (MyISAM in particular) don't cluster the PK, so the primary key is just like a normal unique index.