Using Primary Keys as Index - mysql

In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?

Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.

Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.

Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.

1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.

Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values ​​among more than a thousand it is a good sign to make it index.

Related

What is the best way to choose primary key in candidate key?

I am concerned about which candidate key to select as the primary key among the candidate keys.
Assume that using mysql database(innoDB). Suppose we have a unique value, Student Number, and a unique value, ID Number(eg Social Security Number).
Student ID number and ID number can each be a candidate key.
In this case, what value should I set as the primary key even considering auto-increment new column?
My guess is that innoDB(mysql) use primary key to create the clustering index. So, is it right to use a column where I need to find a specific range, since it has the advantage of being able to find a range?
Thank you!!
First, you should be aware that the US Social Security Number is not unique.
You're correct that InnoDB always treats the primary key as the clustering index. You don't have to guess, it's documented in the manual: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
You don't necessarily need to use the primary key to find a specific range. You could use a secondary index as well. You could even use a non-indexed column, but it would result in a table-scan which causes poor performance unless the table is very small.
Given the choice between searching the clustered index versus a secondary index, it's a little bit more optimized to search the clustered index.
There are exceptions to the guideline, and we can't know if your query is one of those exceptions because you haven't described any of your queries.
This brings up a broader point: you can't choose the best optimization strategy without knowing the specific queries you need to optimize.

When would you want the primary key to be indexed with other columns?

At work we have a table where the primary key is being used as the third column of a three way index. I do not have an intimate understanding of indices so this use case confuses me. If a primary key is both unique and already indexed, what good does it serve to have an extra index that is only useful if the query includes the primary key, which is already uniquely indexed?
In certain situations having a single primary index key need not necessary be optimal for your queries. Like in situations involving multiple searches with frequently used multiple columns. In these cases it makes sense to use these multiple columns in defining indexes. So that these additional indexes may be used in ideal queries for better data retrieval.
Try this article it has more info with some examples http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

Is automatically indexing primary key really good?

In some DBMS like MySQL the primary key is always indexed by default. I know indexing can make operations like selection and comparison of the indexed column much faster, but it can also slow down other operations like insertion and update. There are cases when there are few selections on the primary key of a table, in which indexing will not bring much benefit. In such cases wouldn't it better not indexing the primary key?
Clarification: I just learned that primary key is actually implemented by a special index, like clustered index for InnoDB. Index can definitely be used to enforce the uniqueness constraint of primary key, but is it really necessary to use index to do this? From what I know, index is often implemented as btree which can improve the performance of many more operations than just checking the uniqueness, which can be simply done by a hashtable. So why not use other simpler structures to enforce the uniqueness that have less negative impact on the performance of insert and update operations?
The article here mentions a similar point:
Unique indexes use as much space as nonunique indexes do. The value of
every column as well as the record's location is stored. This can be a
waste if you use the unique index as a constraint and never as an
index. Put another way, you may rely on the unique index to enforce
uniqueness but never write a query that uses the unique value. In this
case, there's no need for MySQL to store the locations of every record
in the index: you'll never use them.
And in the following paragraph:
Unfortunately, there's no way to signal your intentions to MySQL. In
the future, we'll likely find a feature introduced for this specific
case. The MyISAM storage engine already has support for unique columns
without an index (it uses a hash-based system), but the mechanism
isn't exposed at the SQL level yet.
The "hash-based system" is an example of what I meant by "other simpler structures".
A primary key that isn't indexed is neither primary nor even a key.
Your question doesn't make sense.
Let's go back in history about 20 years when MySQL was just getting started. The inventor said to himself, "what indexing system is simple and efficient and generally useful". The answer was BTree. So, BTrees are all that existed for a long time in MySQL. Then he asked himself "what bells and whistles should we put on the PRIMARY KEY". The answer was KISS -- make identical to other UNIQUE indexes. This was the MyISAM engine.
Later (about 15 years ago) another inventor joined forces. He brought 'simple', yet transactional, InnoDB engine. Since transactions really need a PK, InnoDB has a PK that is UNIQUE and clustered. And, again, the data+PK is a BTree.
Every so often someone would ask "Do we need bitmap indexes, hash indexes, a second clustered index, etc." The answer always came back, "No, BTree is good enough." A few non-MySQL engines have been invented to do non-BTree indexes. Perhaps the most successful is Tokutek and its "Fractal index". MariaDB now includes TokuDB. Another is the "columnar indexing" of Infinidb.
(Apologies to Monty and Heikki if they did not actually ask those questions.)
Hash and BTree indexes are about equally fast for "point queries". But for "range queries", Hash is useless and BTree is excellent. Why implement both when one is clearly better?

SQL: Ideal engine type (MyISAM vs InnoDB) and data type for unique text column

Hi I have a php website using mysql and I have a table with a column called 'Name'.
I intend for it to have the following features:
It should be a varchar(N) type just like how regular names are.
It may be long, but should never contain so-called "descriptions" as that is in another field that I don't care about searching. (maybe in the future, which I might even just put it in another table)
It MUST be unique and searchable, which seems to make it a suitable candidate as a primary key.
Searches are the simple types, just behaviours like the mysql LIKE %keyword% will do.
This table is (very) frequently read, new rows inserted every once in a while, rows removed/updated very rarely.
Many other tables refer to values on this table, which ideally I want to have foreign key constraints on the other tables which leads me to want to use InnoDB.
My question is, should I use MyISAM or InnoDB for this table? Also is it ok for my not so long varchar to be used as a primary key considering the frequency of read/amount of memory used/amount of warnings on the internet against varchar primary keys?
But I would really want to benefit from the foriegn-key constraints that InnoDB offers or should I just worry about it at the php level?
My concern, in particular, is MyISAM's Full-text search capabilities. I tried to read the official mysql webpages to understand what is it for, but failed to understand enough to judge if my situation will benefit from it.
Short answer: InnoDB with surrogate primary key.
Longer Answer
Since you intend for the table with the Name column to have many child tables, I'd recommend a surrogate key using an INT UNSIGNED (or even BIGINT UNSIGNED if your data warrants that). That way all your child tables aren't required to have a Name column in them, saving space.
In InnoDB, short primary keys are the best option, because the primary key is included in all secondary indexes: http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html
FULLTEXT indexes are not required to do simple LIKE('%keyword%') matching. They help if you're interested in natural language matching, which you did not indicate as a requirement.

Index a mysql table of 3 integer fields

I have a mysql table of 3 integer fields. None of the fields have a unique value - but the three of them combined are unique.
When I query this table, I only search by the first field.
Which approach is recommended for indexing such table?
Having a multiple-field primary key on the 3 fields, or setting an index on the first field, which is not unique?
Thanks,
Doori Bar
Both. You'll need the multi-field primary key to ensure uniqueness, and you'll want the index on the first field for speed during searches.
You can have a UNIQUE Constraint on the three fields combined to meet your data quality standards. If you are primarily searching by Field1 then you should have an index on it.
You should also consider how you JOIN this table.
Your indexes should really support the bigger workload first - you will have to look at the execution plan to determine what suits you best.
The primary key will prevent your application from accidenttly inserting dupe rows. You probably want that.
Order the columns in the PK correctly though or make an index on the first column clustered for better performance. Compare how the query runs (with the PK present) and with and without the index on the first column.
If you're using InnoDB, you must have a clustered index. If you don't specify one, MySQL will use one in the background anyway. So, you may as well use a clustered (unique) primary key by combining all three columns.
The primary key will also then prevent duplicates, which is a bonus.
If you're returning all three integer fields, then you'll have a covered index, which means that the database won't even have to touch the actual record. It will get everything it needs right from the index.
The only caveat would be inserts (and appends). Updating a clustered index, especially on multiple columns, does have some performance penalization. It will be up to you to test and determine the best approach.