What is the best way to choose primary key in candidate key? - mysql

I am concerned about which candidate key to select as the primary key among the candidate keys.
Assume that using mysql database(innoDB). Suppose we have a unique value, Student Number, and a unique value, ID Number(eg Social Security Number).
Student ID number and ID number can each be a candidate key.
In this case, what value should I set as the primary key even considering auto-increment new column?
My guess is that innoDB(mysql) use primary key to create the clustering index. So, is it right to use a column where I need to find a specific range, since it has the advantage of being able to find a range?
Thank you!!

First, you should be aware that the US Social Security Number is not unique.
You're correct that InnoDB always treats the primary key as the clustering index. You don't have to guess, it's documented in the manual: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
You don't necessarily need to use the primary key to find a specific range. You could use a secondary index as well. You could even use a non-indexed column, but it would result in a table-scan which causes poor performance unless the table is very small.
Given the choice between searching the clustered index versus a secondary index, it's a little bit more optimized to search the clustered index.
There are exceptions to the guideline, and we can't know if your query is one of those exceptions because you haven't described any of your queries.
This brings up a broader point: you can't choose the best optimization strategy without knowing the specific queries you need to optimize.

Related

When would you want the primary key to be indexed with other columns?

At work we have a table where the primary key is being used as the third column of a three way index. I do not have an intimate understanding of indices so this use case confuses me. If a primary key is both unique and already indexed, what good does it serve to have an extra index that is only useful if the query includes the primary key, which is already uniquely indexed?
In certain situations having a single primary index key need not necessary be optimal for your queries. Like in situations involving multiple searches with frequently used multiple columns. In these cases it makes sense to use these multiple columns in defining indexes. So that these additional indexes may be used in ideal queries for better data retrieval.
Try this article it has more info with some examples http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

Using Primary Keys as Index

In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?
Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.
Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.
Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.
1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.
Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values ​​among more than a thousand it is a good sign to make it index.

Why we should have an ID column in the table of users?

It's obvious that we already have another unique information about each user, and that is username. Then, why we need another unique thing for each user? Why should we also have an id for each user? What would happen if we omit the id column?
Even if your username is unique, there are few advantages to having an extra id column instead of using the varchar as your primary key.
Some people prefer to use an integer column as the primary key, to serve as a surrogate key that never needs to change, even if other columns are subject to change. Although there's nothing preventing a natural primary key from being changeable too, you'd have to use cascading foreign key constraints to ensure that the foreign keys in related tables are updated in sync with any such change.
The primary key being a 32-bit integer instead of a varchar can save space. The choice between a int or a varchar foreign key column in every other table that references your user table can be a good reason.
Inserting to the primary key index is a little bit more efficient if you add new rows to the end of the index, compared to of wedging them into the middle of the index. Indexes in MySQL tables are usually B+Tree data structures, and you can study these to understand how they perform.
Some application frameworks prefer the convention that every table in your database has a primary key column called id, instead of using natural keys or compound keys. Following such conventions can make certain programming tasks simpler.
None of these issues are deal-breakers. And there are also advantages to using natural keys:
If you look up rows by username more often than you search by id, it can be better to choose the username as the primary key, and take advantage of the index-organized storage of InnoDB. Make your primary lookup column be the primary key, if possible, because primary key lookups are more efficient in InnoDB (you should be using InnoDB in MySQL).
As you noticed, if you already have a unique constraint on username, it seems a waste of storage to keep an extra id column you don't need.
Using a natural key means that foreign keys contain a human-readable value, instead of an arbitrary integer id. This allows queries to use the foreign key value without having to join back to the parent table for the "real" value.
The point is that there's no rule that covers 100% of cases. I often recommend that you should keep your options open, and use natural keys, compound keys, and surrogate keys even in a single database.
I cover some issues of surrogate keys in the chapter "ID Required" in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
This identifier is known as a Surrogate Key. The page I linked lists both the advantages and disadvantages.
In practice, I have found them to be advantageous because even superkey data can change over time (i.e. a user's email address may change and thus any corresponding relations must change), but a surrogate key never needs to change for the data it identifies because its value is meaningless to the relation.
It's also nice from a JOIN standpoint because it can be an integer with a smaller key length than a varchar.
I can say that in practice I prefer to use them. I have been bitten too many times by having multiple-column primary keys or a data-representative superkey used across tables having to become non-unique later due to changing requirements during development, and that is not a situation you want to deal with.
In my opinion, every table should have a unique, auto-incremented id.
Here are some practical reasons. If you have duplicate rows, you can readily determine which row to delete. If you want to know the order that rows were inserted, you have that information in the id. As for users, there's more than on "John Smith" in the world. An id provides a key for foreign references.
Finally, just about anything that might describe a user -- a name, an address, a telephone number, an email address -- could change over time.
im mysql we have.
1:Index fields 2:Unique fields and 3:PK fields.
index means pointable
unique means in a table must be one in all rows.
PK = index + unique
in a table you may have lots of unique fields like
username or passport code or email.
but you need a field like ID. that is both unique and index (=PK).which is first is always one thing and never changes and second is unique and third is simple (because is often number).
One reason to have a numeric id is that creating an index on it is leaner than on a text-field, reducing index size and processing time required to look up a specific user. Also it's less bytes to save when cross-referencing to a user (relational database) in a different table.

Can you have multiple Keys in SQL and why would you want that?

Is there a reason why you would want to have multiple KEYs in a TABLE? What is the point of having multiple KEYs in one table?
Here is an example that I found:
CREATE TABLE orders(
id INT UNSIGNED NOT NULL AUTO INCREMENT,
user_id INT UNSIGNED NOT NULL,
transaction_id VARCHAR(19) NOT NULL,
payment_status VARCHAR(15) NOT NULL,
payment_amount DECIMAL(15) NOT NULL,
payment_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id),
KEY(user_id),
)
Also, you'll notice the DBase programmer doesn't make transaction_id a KEY. Is there a reason for this?
KEY in MySQL is an alternate syntax for "index".
Indexes are common across databases, but they aren't covered by ANSI as of yet -- it's pure miracle things are as similar as they are. It can be common to have more than one index associated to a table -- because indexes improve data retrieval at the cost of update/delete/insert speed.
Be aware that MySQL (5.x?) automatically creates an index if one doesn't already exist for the primary key of a table.
There are several possible reasons.
Enhance search performance (when the WHERE clause uses KEY fields, it performs faster)
Restrain table contents (when using a UNIQUE key for a column, it can't have the same value twice or more on it)
These are the most common reasons.
In SQL you may only have one PRIMARY KEY per table.
KEY(foo) is bogus syntax in standard SQL. In MySQL KEY foo is a poorly named synonym for INDEX foo and does not impose a UNIQUE constraint. (It is poorly named because it does not actually relate to the functioning of a key.)
There may be multiple UNIQUE INDICES which can play the role of "candidate keys". (A unique constraint can be specified without an associated INDEX, but the point of a "key" is generally a quick look-up.) The point of a PRIMARY KEY is to uniquely identify a single record and is almost exclusively INDEX-backed and may even be directly related to the clustering of the data.
Only the minimal amount of [unique] INDICES required to ensure data validity and meet performance requirements should be used -- they impose performance penalties on the query engine as well as have additional update and maintenance costs.
Non-unique INDEX's (INDEX foo, or in the case of MySQL, KEY foo) are purely to allow the database to optimize queries. They do not really map to "keys" and may be referred to as "covering indices"; if selected by the query planner these indices can aid in performance even though they add nothing to the logical model itself. (For performance reasons, a database engine may require that FOREIGN KEYS are covered by indices.)
In this case, creating an INDEX (don't think "KEY"!) on user_id will generally (greatly) speed up queries with clauses like:
... WHERE user_id = {somenumber}
Without the INDEX the above query would require a FULL TABLE SCAN (e.g. read through all records).
While I do not know why transaction_id is not made an index, it might not be required (or even detrimental for the given access patterns) -- consider the case where every query that needs to be fast either:
Does not use transaction_id or;
Also has a user_id = ... or other restriction that can utilize an INDEX. That is, in a case like WHERE user_id = ... AND transaction_id = ... the query planner will likely first find the records for the matched user and then look for the matching transaction_id -- it still has to do a SCAN, but only over a much smaller data-set than the original table. Only a plain WHERE transaction_id = ... would necessarily require a FULL TABLE SCAN.
If in doubt, use EXPLAIN -- or other query analyzer -- and see what MySQL thinks it should do. As a last note, sometimes estimated query execution plans may differ from actual execution plans and outdated statistics can make the query planner choose a non-ideal plan.
Happy coding.
"Keys" might refer to one of:
Index (search optimization)
Constraint (e.g. foreign key, primary key)
You may want multiple because you need to implement more than one of these features in a single table. It's actually quite common.
In database theory, a key is a constraint that enforces uniqueness. A relvar (analogous to a SQL table) may indeed have more than one candidate key. A classic example is the set of chemical elements, for which name, symbol, atomic number and atomic weight are all keys (and folk will still want to add their own surrogate key to it ;)
MySQL continues a long tradition of abuse of the word KEY by making it a synonym for INDEX. Clearly, a MySQL table may have as many indexes as deemed necessary for performance for a given set of circumstances.
From your SQL DDL, it seems clear the ID is a surrogate key, so we must look for a natural key with not much to go on. While transaction_id may be a candidate, it is not inconceivable that an order can involve more than one transaction. In practical terms, I think an auditor would be suspicious of multiple orders made by the same user simultaneously, therefore suggest the compound of user_id and payment_time should be a key. However, if transaction_id is not a key then the table would not be fully normalized. Therefore, I'd give the designer the benefit of the doubt and assume transaction_id is also a key.

Index a mysql table of 3 integer fields

I have a mysql table of 3 integer fields. None of the fields have a unique value - but the three of them combined are unique.
When I query this table, I only search by the first field.
Which approach is recommended for indexing such table?
Having a multiple-field primary key on the 3 fields, or setting an index on the first field, which is not unique?
Thanks,
Doori Bar
Both. You'll need the multi-field primary key to ensure uniqueness, and you'll want the index on the first field for speed during searches.
You can have a UNIQUE Constraint on the three fields combined to meet your data quality standards. If you are primarily searching by Field1 then you should have an index on it.
You should also consider how you JOIN this table.
Your indexes should really support the bigger workload first - you will have to look at the execution plan to determine what suits you best.
The primary key will prevent your application from accidenttly inserting dupe rows. You probably want that.
Order the columns in the PK correctly though or make an index on the first column clustered for better performance. Compare how the query runs (with the PK present) and with and without the index on the first column.
If you're using InnoDB, you must have a clustered index. If you don't specify one, MySQL will use one in the background anyway. So, you may as well use a clustered (unique) primary key by combining all three columns.
The primary key will also then prevent duplicates, which is a bonus.
If you're returning all three integer fields, then you'll have a covered index, which means that the database won't even have to touch the actual record. It will get everything it needs right from the index.
The only caveat would be inserts (and appends). Updating a clustered index, especially on multiple columns, does have some performance penalization. It will be up to you to test and determine the best approach.