MySQL Tables with Temp Data - Include a Primary Key? - mysql

I'm putting together a new database and I have a few tables that contain temp data.
e.g.: user requests to change password - a token is stored and then later removed.
Currently I have a primary key on these tables that will auto-increment from 1 upwards.
AUTO_INCREMENT = 1;
I don't really see any use for this primary key... I will never reference it and it will just get larger.
Should tables like this have a primary key or not?

Short answer: yes.
Long answer:
You need your table to be joinable on something If you want your table
to be clustered, you need some kind of a primary key. If your table
design does not need a primary key, rethink your design: most
probably, you are missing something. Why keep identical records? In
MySQL, the InnoDB storage engine always creates a PRIMARY KEY if you
didn't specify it explicitly, thus making an extra column you don't
have access to.
Note that a PRIMARY KEY can be composite.
If you have a many-to-many link table, you create the PRIMARY KEY on
all fields involved in the link. Thus you ensure that you don't have
two or more records describing one link.
Besides the logical consistency issues, most RDBMS engines will
benefit from including these fields in an UNIQUE index.
And since any PRIMARY KEY involves creating a UNIQUE index, you should
declare it and get both logical consistency and performance.
Here is a SO thread already have same discussion.
Some people still loves to go with your opinion. Have a look here
My personal opinion is that you should have primary keys, to identify or to make a row unique. The logic can be your program logic. Can be an auto-increment or composite or whatever it can be.

Related

Dimensional Modeling: how to create a table without Surrogate Primary Keys?

From what I have understand, we don't have Primary Key in the fact table and put a Surrogate Key is somehow a waste of space. Hence, the foreign key combination is the primary key for the fact table.
But I may case, I was not able to do that because the unique keys can potentially repeat in the fact table, e.g. same person paid twice on the same day, in same restaurant. In this cas, the primary key is no longer unique...
Is there anyway to solve this problème, without adding a surrogate key?
Thanks in advance !
If you are building a table like this a primary key or unique key combination is strongly recommended but if you are avoiding adding PK you may want to add unique transaction numbers so that you can do a combination of customer number and transaction number as the key combo.
InnoDB, if you don't provide a PK, will provide one for you. But it is 6 bytes and hidden. Compared to a 4-byte surrogate INT, this is bigger!
Check the data; there may be a "natural" PK that is a column or combination of columns.
Generally, for DW, the only index I have on the Fact table is the PK. Then I use "Summary tables" for the bulk of accesses. These are smaller and faster. In an extreme case, I will purge old Fact rows (via DROP PARTITION) but hang onto the Summary tables 'forever'. This keeps the disk space in check, while losing virtually nothing useful of the data.
Bottom line: Provide an explicit PK for every table.

Is a primary key necessary? [duplicate]

This question already has answers here:
SQL Primary Key - is it necessary?
(5 answers)
Closed 7 years ago.
In database systems, should every table have a primary key?
For example I have a table table1(foreignkey1,foreignkey2,attribute) like this.table1 does not have a primary key.
Should I define a primary key for this table like table1id?
This is a subjective question, so I hope you don't mind me answering with some opinion :)
In the vast majority of tables I've made – I'm talking 95%+ – I've added a primary key, and been glad I did. This is either the most critical unique field in my table (think "social security number") or, more often than not, just an auto-incrementing number that allows me to quickly and easily refer to a field when querying.
This latter use is the most common, and it even has its own name: a "surrogate" or "synthetic" key. This is a value auto-generated by the database and not derived from your application data. If you want to add relations between your tables, this surrogate key is immediately helpful as a foreign key. As someone else answered, these keys are so common that MySQL likes to add one even if you don't, so I'd suggest that means the consensus is very heavily biased towards adding primary keys.
One other thing I like about primary keys is that they help convey your intent to others reading your table schemata and also to your DMBS: "this bit is how I intend to identify my rows uniquely, don't let me try to break that rule!"
To answer your question specifically: no, a primary key is not necessary. But realistically if you intend to store data in the table for any period of time beyond a few minutes, I would very strongly recommend you add one.
No, it is not required for every table to have a primary key. Whether or not a table should have a primary key is based on requirements of your database.
Even though this is allowed it is bad practice because it allows for one to add duplicate rows further preventing the unique identification of rows. Which contradicts the underline purposes of having a database.
I am a strong fan of synthetic primary keys. These are auto-incremented columns that uniquely identify each row.
These provide functionality such as:
Ability to see the order of insertion of rows. Which were inserted most recently?
Ability to create a foreign key relationship to the table. You might not need one now, but it might be useful in the future.
Ability to rename "data" columns without affecting other tables.
Presumably, for your table, you can define a primary key on (foreignkey1, foreighkey2). Composite primary keys are also sensible, but they are cumbersome for foreign key relationships and joins. And, when there are foreign key relationships, they may cause additional storage, because the composite key ends up being stored across multiple tables.
It's a good practise to have a primary key/composite primary key for a table:
it helps to join tables,
clustered tables will need primary key.
Database design should have primary key for a table.
In MySQL storage engine always creates a PRIMARY KEY if you didn't specify it explicitly, thus making an extra column you don't have access to.
You can create Composite Primary key like:
CREATE TABLE table1(
FK1 INT,
FK2 INT,
ATTRIBUTE INT,
PRIMARY KEY (FK1, FK2)
)
or create a constraint on table1:
ALTER TABLE table_name
ADD CONSTRAINT pk_table1 PRIMARY KEY (FK1,FK2)

Should I create a surrogate key instead of a composite key?

Structure:
Actor <=== ActorMovie ===> Movie
ActorMovie: ActorID (fk), MovieId (fk)... ===> pk: (ActorID, MovieID)
Should do I create a surrogate key for ActorMovie table like this?
ActorMovie: ActorMovieID (pk), ActorID (fk), MovieId (fk)...
Conventions are good if they are helpful
"SQL Antipatterns", Chapter 4, "ID Required"
Intention of primary key
Primary key - is something that you can use to identify your row with it's unique address in table. That means, not only some surrogate column can be primary key. In fact, primary key should be:
Unique. identifier for each row. If it's compound, that means, every combination of column's values must be unique
Minimal. That means, it can't be reduced (i.e. if it's compound, no column could be omitted without losing uniqueness)
Single. No other primary key may be defined, each table can have only one primary key
Compound versus surrogate
There are cases, when surrogate key has benefits. Most common problem - if you have table with people names. Can combination of first_name + last_name + taxpayer_id be unique? In most cases - yes. But in theory, there could be cases, when duplicated will occur. So, this is the case, when surrogate key will provide unique identifying of rows in any case.
However, if we're talking about many-to-many link between tables, it's obvious, that linking table will always contain each pair once. In fact, you'll even need to check if duplicate does not exist before operating with that table (otherwise - it's redundant row, because it holds no additional information unless your design has a special intention to store that). Therefore, your combination of ActorID + MovieID satisfies all conditions for primary key, and there's no need to create surrogate key. You can do that, but that will have little sense (if not at all), because it will have no meaning rather than numbering rows. In other hand, with compound key, you'll have:
Unique check by design. Your rows will be unique, no duplicates for linking table will be allowed. And that has sense: because there's no need to create a link if it already exists
No redundant (and, thus, less comprehensive) column in design. That makes your design easier and more readable.
As a conclusion - yes, there are cases, when surrogate key should (or even must) be applied, but in your particular case it will definitely be antipattern - use compound key.
References:
Primary keys in SQL
SQL Antipatterns by Bill Karwin
Let me just mention a detail that seems to have been missed by other posters: InnoDB tables are clustered.
If you have just a primary key, your whole table will be represented by a lone B-Tree, which is very efficient. Adding a surrogate would just create another B-Tree (and "fatter" than expected to boot, due to how clustering works), without benefit to offset the added overhead.
Surrogates have their place, but junction tables are usually not it.
I'd always go with the composite key. My reasoning:
You will probably never use the surrogate key anywhere.
You will reduce the number of indexes/constraints on the table, as you will most certainly still need a indexes over actor and movie.
You will always search for either movie or an actor anyway.
Unless you have a scenario where you will actually use the surrogate key outside of it's own table, I'd go with the composite key.
If you want to associate other data elements with the join table, such as the name(s) of the role(s) played (which might be a child table) then I certainly would. If you were sure that you never wanted to then I'd consider it as optional.
Consider the first normal form (1NF) of database design normalization.
I would have made the ActorID and MovieID as unique key combination then create a primary key ActorMovieID.
See the same question here: Two foreign keys instead of primary
On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Composite keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons.
So when you start asking yourself 'is composite better than surrogate', you have already entered the process of loosing your time. Go for surrogate. It allways works. And switch to next step.

Why we should have an ID column in the table of users?

It's obvious that we already have another unique information about each user, and that is username. Then, why we need another unique thing for each user? Why should we also have an id for each user? What would happen if we omit the id column?
Even if your username is unique, there are few advantages to having an extra id column instead of using the varchar as your primary key.
Some people prefer to use an integer column as the primary key, to serve as a surrogate key that never needs to change, even if other columns are subject to change. Although there's nothing preventing a natural primary key from being changeable too, you'd have to use cascading foreign key constraints to ensure that the foreign keys in related tables are updated in sync with any such change.
The primary key being a 32-bit integer instead of a varchar can save space. The choice between a int or a varchar foreign key column in every other table that references your user table can be a good reason.
Inserting to the primary key index is a little bit more efficient if you add new rows to the end of the index, compared to of wedging them into the middle of the index. Indexes in MySQL tables are usually B+Tree data structures, and you can study these to understand how they perform.
Some application frameworks prefer the convention that every table in your database has a primary key column called id, instead of using natural keys or compound keys. Following such conventions can make certain programming tasks simpler.
None of these issues are deal-breakers. And there are also advantages to using natural keys:
If you look up rows by username more often than you search by id, it can be better to choose the username as the primary key, and take advantage of the index-organized storage of InnoDB. Make your primary lookup column be the primary key, if possible, because primary key lookups are more efficient in InnoDB (you should be using InnoDB in MySQL).
As you noticed, if you already have a unique constraint on username, it seems a waste of storage to keep an extra id column you don't need.
Using a natural key means that foreign keys contain a human-readable value, instead of an arbitrary integer id. This allows queries to use the foreign key value without having to join back to the parent table for the "real" value.
The point is that there's no rule that covers 100% of cases. I often recommend that you should keep your options open, and use natural keys, compound keys, and surrogate keys even in a single database.
I cover some issues of surrogate keys in the chapter "ID Required" in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
This identifier is known as a Surrogate Key. The page I linked lists both the advantages and disadvantages.
In practice, I have found them to be advantageous because even superkey data can change over time (i.e. a user's email address may change and thus any corresponding relations must change), but a surrogate key never needs to change for the data it identifies because its value is meaningless to the relation.
It's also nice from a JOIN standpoint because it can be an integer with a smaller key length than a varchar.
I can say that in practice I prefer to use them. I have been bitten too many times by having multiple-column primary keys or a data-representative superkey used across tables having to become non-unique later due to changing requirements during development, and that is not a situation you want to deal with.
In my opinion, every table should have a unique, auto-incremented id.
Here are some practical reasons. If you have duplicate rows, you can readily determine which row to delete. If you want to know the order that rows were inserted, you have that information in the id. As for users, there's more than on "John Smith" in the world. An id provides a key for foreign references.
Finally, just about anything that might describe a user -- a name, an address, a telephone number, an email address -- could change over time.
im mysql we have.
1:Index fields 2:Unique fields and 3:PK fields.
index means pointable
unique means in a table must be one in all rows.
PK = index + unique
in a table you may have lots of unique fields like
username or passport code or email.
but you need a field like ID. that is both unique and index (=PK).which is first is always one thing and never changes and second is unique and third is simple (because is often number).
One reason to have a numeric id is that creating an index on it is leaner than on a text-field, reducing index size and processing time required to look up a specific user. Also it's less bytes to save when cross-referencing to a user (relational database) in a different table.

When we don't need a primary key for our table?

Will it ever happen that we design a table that doesn't need a primary key?
No.
The primary key does a lot of stuff behind-the-scenes, even if your application never uses it.
For example: clustering improves efficiency (because heap tables are a mess).
Not to mention, if ANYONE ever has to do something on your table that requires pulling a specific row and you don't have a primary key, you are the bad guy.
Yes.
If you have a table that will always be fetched completely, and is being referred-to by zero other tables, such as some kind of standalone settings or configuration table, then there is no point having a primary key, and the argument could be made by some that adding a PK in this situation would be a deception of the normal use of such a table.
It is rare, and probably when it is most often done it is done wrongly, but they do exist, and such instances can be valid.
Depends.
What is primary key / unique key?
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value (or combination of values) in those columns if NULL values are not used. Depending on its design, a table may have arbitrarily many unique keys but at most one primary key.
So, when you don't have to differentiate (uniquely identify) each row,
you don't have to use primary key
For example, a big table for logs,
without using primary key, you can have fairly smaller size of data and faster for insertion
Primary key not mandatory but it is not a good practice to create tables without primary key. DBMS creates auto-index on PK, but you can make a column unique and index it, e.g. user_name column in users table are usually made unique and indexed, so you may choose to skip PK here. But it is still a bad idea because PK can be used as foreign key for referential integrity.
In general, you should almost always have PK in a table unless you have very strong reason to justify not having a PK.
Link tables (in many to many relationship) may not have a primary key. But, I personally like to have PK in those tables as well.