Why we should have an ID column in the table of users? - mysql

It's obvious that we already have another unique information about each user, and that is username. Then, why we need another unique thing for each user? Why should we also have an id for each user? What would happen if we omit the id column?

Even if your username is unique, there are few advantages to having an extra id column instead of using the varchar as your primary key.
Some people prefer to use an integer column as the primary key, to serve as a surrogate key that never needs to change, even if other columns are subject to change. Although there's nothing preventing a natural primary key from being changeable too, you'd have to use cascading foreign key constraints to ensure that the foreign keys in related tables are updated in sync with any such change.
The primary key being a 32-bit integer instead of a varchar can save space. The choice between a int or a varchar foreign key column in every other table that references your user table can be a good reason.
Inserting to the primary key index is a little bit more efficient if you add new rows to the end of the index, compared to of wedging them into the middle of the index. Indexes in MySQL tables are usually B+Tree data structures, and you can study these to understand how they perform.
Some application frameworks prefer the convention that every table in your database has a primary key column called id, instead of using natural keys or compound keys. Following such conventions can make certain programming tasks simpler.
None of these issues are deal-breakers. And there are also advantages to using natural keys:
If you look up rows by username more often than you search by id, it can be better to choose the username as the primary key, and take advantage of the index-organized storage of InnoDB. Make your primary lookup column be the primary key, if possible, because primary key lookups are more efficient in InnoDB (you should be using InnoDB in MySQL).
As you noticed, if you already have a unique constraint on username, it seems a waste of storage to keep an extra id column you don't need.
Using a natural key means that foreign keys contain a human-readable value, instead of an arbitrary integer id. This allows queries to use the foreign key value without having to join back to the parent table for the "real" value.
The point is that there's no rule that covers 100% of cases. I often recommend that you should keep your options open, and use natural keys, compound keys, and surrogate keys even in a single database.
I cover some issues of surrogate keys in the chapter "ID Required" in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

This identifier is known as a Surrogate Key. The page I linked lists both the advantages and disadvantages.
In practice, I have found them to be advantageous because even superkey data can change over time (i.e. a user's email address may change and thus any corresponding relations must change), but a surrogate key never needs to change for the data it identifies because its value is meaningless to the relation.
It's also nice from a JOIN standpoint because it can be an integer with a smaller key length than a varchar.
I can say that in practice I prefer to use them. I have been bitten too many times by having multiple-column primary keys or a data-representative superkey used across tables having to become non-unique later due to changing requirements during development, and that is not a situation you want to deal with.

In my opinion, every table should have a unique, auto-incremented id.
Here are some practical reasons. If you have duplicate rows, you can readily determine which row to delete. If you want to know the order that rows were inserted, you have that information in the id. As for users, there's more than on "John Smith" in the world. An id provides a key for foreign references.
Finally, just about anything that might describe a user -- a name, an address, a telephone number, an email address -- could change over time.

im mysql we have.
1:Index fields 2:Unique fields and 3:PK fields.
index means pointable
unique means in a table must be one in all rows.
PK = index + unique
in a table you may have lots of unique fields like
username or passport code or email.
but you need a field like ID. that is both unique and index (=PK).which is first is always one thing and never changes and second is unique and third is simple (because is often number).

One reason to have a numeric id is that creating an index on it is leaner than on a text-field, reducing index size and processing time required to look up a specific user. Also it's less bytes to save when cross-referencing to a user (relational database) in a different table.

Related

Should one combine foreign keys that point to the same table if all columns are required?

I encounter this situation frequently. An example,
A user is uniquely identified by appId, externalUserId.
Table xxxContract has a foreign key (fileUploadId, appId, externalUserId) to table fileUpload that ensures the file upload belongs to the specified user.
Table xxxContract has a foreign key (businessId, appId, externalUserId) to table business that ensures the business belongs to the specified user.
With the above two, we guarantee user A's file upload won't be used as a contract for user B's business.
xxxContract also has a fileTypeId column that is STORED GENERATED to a certain value that says "This contract is of file type XXX_CONTRACT"
Table xxxContract also has a foreign key (fileUploadId, fileTypeId) to table fileUpload.
This guarantees we only use XXX_CONTRACT file uploads for xxxContract, and not accidentally use other file types.
Given the above, we have this situation where we have two foreign keys that point to the same table fileUpload, and even have overlapping columns,
(fileUploadId, appId, externalUserId)
(fileUploadId, fileTypeId)
And all the columns are NOT NULL.
So, it seems to me like it's safe to combine the foreign keys into one larger foreign key,
(fileUploadId, appId, externalUserId, fileTypeId)
And we'll still have the same guarantees as before.
My gut feeling is that I should not combine the foreign keys because separating them by meaning and giving the FKs meaningful names helps with maintainability.
But I've never had a formal education with these things so I'd like to know what the industry standard is.
Related, is there a performance benefit to combining them vs. separating them?
But I've never had a formal education with these things so I'd like to know what the industry standard is.
The standard is, that there is no standard.
As you already noted, you can use multiple columns to define a primary key. This is called a natural primary key, for instance: A user can be uniquely identified by firstname, lastname - and birthdate. (at least almost ever)
This kind of keys is often called composite keys, because every column alone doesn't work out, only combined they form a primary key.
Surrogate (or artifical) primary keys are also well known: id column, using auto-increment.
So, as to your question: Yes, if you have 3 columns that already form a natural primary key, it is completly safe to add more columns. Since the 3 columns already present will uniquely identify the row, there is no harm in adding a 4th, 5th or even 6th column to the key.
Whether you are going to use natural or surrogate primary keys depens on personal preference i'd say. I never use natural keys, even on tables where this is possible.
Keep in mind, that whenever you need to delete / update something, you always need to know the primary key. hence, with natural keys, you need to move multiple values through many method-calls, while surrogate keys offer the advantage of just having "one" id to uniquely identify a row. No more information required.
Performance-wise, i assume that (Integer-based) surrogate primary keys tend to be faster than (String-based) natural primary keys. It's even less columns to consider when writing queries and/or designing indexes.

Can we use any other unique constraint as primary key in database like a phone number, or national Id

I'm creating a database that has column of person's mobile phone number. Now i just want to know without making a separate column for id and making it a primary key, can i make this column a primary key for this table?
As noted above, you technically could use a phone number as a primary key, but it is not a best practice, because:
You would not be able to insert another user who happens to have the same phone number (primary keys must be unique).
You will run into what is known as an "update anomaly", if you have other tables that reference your tables primary key, and you decide to change a user's mobile number, you will have to also update the mobile number in all of the dependent tables.How to maintain referential integrity
From a performance standpoint, indexes on numeric values are usually more efficient than indexes on varchars, and will improve the performance on your joins, and the index will take up less space.
More often than not, your best bet is to use an auto-incrementing surrogate key.
Technically, you can define any column as primary key. The question is if such definition is good or bad. If you are going to use a phone number (that should be stored as string) and the column will not only be a primary key but also unique, and you will make sure that no attempt will be made to insert two times the same number for different people, then it should be OK.

Table without a primary key

So I've always been told that it's absolutely necessary to have a primary key specified with a table. I've been doing some work and ran into a situation where a primary key's unique constraint would stop data I need from being added.
If there's an example situation where a table was structured with fields:
Age, First Name, Last Name, Country, Race, Gender
Where if a TON of data was being entered all these fields don't necessarily uniquely identify a row and I don't need an index across all columns anyways. Would the only solution here be to make an auto-incrementing ID field? Would it be okay to NOT have a primary at all?
It's not always necessary to have a primary key, most DBMS' will allow you to construct a table without one (a).
But that doesn't necessarily mean it's a good idea. Have a think about the situation in which you want to use that data. Now think about if you have two twenty-year-old Australian men named Bob Smith, both from Perth.
Without a unique constraint, you can put both rows into the table but her's the rub. How would you figure out which one you want to use in future? (b)
Now, if you just want to store the fact that there are one or more people meeting those criteria, you only need to store one row. But then, you'd probably have a composite primary key consisting of all columns.
If you have other information you want to store about the person (e.g., highest score in the "2048" game on their iPhone), then you don't want a primary key across the entire row, just across the columns you mention.
Unfortunately, that means there will undoubtedly come a time when both of those Bob Smith's try to write their high score to the database, only to find one of them loses their information.
If you want them both in the table and still want to allow for the possibility outlined above (two people with identical attributes in the columns you mention) then the best bet is to introduce an artificial key such as an auto-incrementing column, for the primary key. That will allow you to uniquely identify a row regardless of how identical the other columns are.
The other advantage of an artificial key is that, being arbitrary, it never needs to change for the thing being identified. In your example, if you use age, names, nationality or location (c) in your primary key, these are all subject to change, meaning that you will need to adjust any foreign keys referencing those rows. If the tables referencing these rows uses the unchanging artificial key, that will never be a problem.
(a) There are situations where a primary key doesn't really give you any performance benefit such as when the table is particularly small (such as mapping integers 1 through 12 to month name).
In other words, things where a full table scan isn't really any slower than indexing. But these situations are incredibly rare and I'd probably still use a key because it's more consistent (especially since the use of a key tends not to make a difference to the performance either way).
(b) Keep in mind that we're talking in terms of practice here rather than theory. While in practice you may create a table with no primary key, relational theory states that each row must be uniquely identifiable, otherwise relations are impossible to maintain.
C.J. Date who, along with Codd, is one of the progenitors of relational database theory, states the rules of relational tables in "An introduction to Database Systems", one of which is:
The records have a unique identifier field or field combination called the primary key.
So, in terms of relational theory, each table must have a primary key, even though it's not always required in practice.
(c) Particularly age which is guaranteed to change annually until you're dead, so perhaps date of birth may be a better choice for that column.
Would the only solution here be to make an auto-incrementing ID field?
That is a valid way, but it is not the only one: you could use other ways to generate unique keys, such as using GUIDs. Keys like that are called surrogate primary keys, because they are not related to the "payload" of the data row.
Would it be okay to NOT have a primary at all?
Since you mentioned that the actual data in rows may not be unique, you wouldn't be able to use your table effectively without a primary key. For example, you would not be able to update or delete a specific row, which may be required, for example, when a user's name changes.
The most simple solution would be to include an ID column to serve as primary key:
id int not null primary key auto_increment
From your post it looks like the table representing a person entity. In that case, wouldn't having a PK would determine each person entity uniquely. I would suggest, having a primary key on the table which will uniquely determine each person record.
You can either create a AUTO_INCREMENT ID column (a synthetic ID column)
(OR)
You can combine multiple columns in your table which can uniquely determine all the other fields like (First Name, Last Name) probably which will make it a composite primary key but that may clash as well since there could be more than one person having same full name (first name + last name).
Typically you should avoid proliferating ID primary keys fields through your database.
Now, that doesn't mean you shouldn't have primary keys, your primary key can be a surrogate or a composed key. And that's what you should do here.
If those fields {Age, First Name, Last Name, Country, Race, Gender}, identify unequivocally each row, then make a primary key composed by all of those fields.
But if not, then you must have some other type of information to disambiguate your data.
You can also, not specify any kind of key, and assume that table as non-normalized, and redundant data source... if this is what you need...!
Use an identity column with another column such as Last Name

MySQL Tables with Temp Data - Include a Primary Key?

I'm putting together a new database and I have a few tables that contain temp data.
e.g.: user requests to change password - a token is stored and then later removed.
Currently I have a primary key on these tables that will auto-increment from 1 upwards.
AUTO_INCREMENT = 1;
I don't really see any use for this primary key... I will never reference it and it will just get larger.
Should tables like this have a primary key or not?
Short answer: yes.
Long answer:
You need your table to be joinable on something If you want your table
to be clustered, you need some kind of a primary key. If your table
design does not need a primary key, rethink your design: most
probably, you are missing something. Why keep identical records? In
MySQL, the InnoDB storage engine always creates a PRIMARY KEY if you
didn't specify it explicitly, thus making an extra column you don't
have access to.
Note that a PRIMARY KEY can be composite.
If you have a many-to-many link table, you create the PRIMARY KEY on
all fields involved in the link. Thus you ensure that you don't have
two or more records describing one link.
Besides the logical consistency issues, most RDBMS engines will
benefit from including these fields in an UNIQUE index.
And since any PRIMARY KEY involves creating a UNIQUE index, you should
declare it and get both logical consistency and performance.
Here is a SO thread already have same discussion.
Some people still loves to go with your opinion. Have a look here
My personal opinion is that you should have primary keys, to identify or to make a row unique. The logic can be your program logic. Can be an auto-increment or composite or whatever it can be.

When we don't need a primary key for our table?

Will it ever happen that we design a table that doesn't need a primary key?
No.
The primary key does a lot of stuff behind-the-scenes, even if your application never uses it.
For example: clustering improves efficiency (because heap tables are a mess).
Not to mention, if ANYONE ever has to do something on your table that requires pulling a specific row and you don't have a primary key, you are the bad guy.
Yes.
If you have a table that will always be fetched completely, and is being referred-to by zero other tables, such as some kind of standalone settings or configuration table, then there is no point having a primary key, and the argument could be made by some that adding a PK in this situation would be a deception of the normal use of such a table.
It is rare, and probably when it is most often done it is done wrongly, but they do exist, and such instances can be valid.
Depends.
What is primary key / unique key?
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value (or combination of values) in those columns if NULL values are not used. Depending on its design, a table may have arbitrarily many unique keys but at most one primary key.
So, when you don't have to differentiate (uniquely identify) each row,
you don't have to use primary key
For example, a big table for logs,
without using primary key, you can have fairly smaller size of data and faster for insertion
Primary key not mandatory but it is not a good practice to create tables without primary key. DBMS creates auto-index on PK, but you can make a column unique and index it, e.g. user_name column in users table are usually made unique and indexed, so you may choose to skip PK here. But it is still a bad idea because PK can be used as foreign key for referential integrity.
In general, you should almost always have PK in a table unless you have very strong reason to justify not having a PK.
Link tables (in many to many relationship) may not have a primary key. But, I personally like to have PK in those tables as well.