Add Primary Key to a table with existing clustered index - sql-server-2008

I have to work with a database to do reporting
The DB is quite big : 416 055 104 rows
Each row is very light though, just booleans and int ids.
Each row is identify by 3 columns, but at my surprise, there is no Primary Key on it.
Only a Clustered Index with a unique constraint.
So Knowing that, I have 2 question.
Could there be ANY good reason for that?
Is there any way I can turn this into a primary key.
Regarding question 2
Creating a new primary key also creates a non-clustered index to associate with (there is already an existing clustered one).
This is not what I am looking for. I want to keep that same index, but also make it a primary key.
Is it possible?
Would that be faster that creating the whole index again? (I hope so)
What could be the consequences? (locks? crash? corrupted data?)

There is little or no difference between a PRIMARY KEY and a UNIQUE constraint on non-nullable columns. So if the columns in question are non-nullable then I suggest you do nothing. The main reason to make a candidate key into a primary key is if you have some software (such as a data modelling tool or other development tool) that expects the key to be identified with a PRIMARY KEY constraint.

Good question.
If you already have a unique index on non nullable columns then you have a candidate key. I'm not aware of any particular benefit of making this an "official" primary key. In fact I have a feeling that not making it a PK will give greater flexibility.

A unique index can allow null
values. A primary key can't.
I believe you can't "mark" an existing index as the primary key. You'd have to drop it and recreate. To avoid stuff, I'd say it'd be good to place a TABLOCKX, HOLDLOCK on the table before doing that.

Related

In a one to zero/one relationship, (MySQL, InnoDB) are there any performance improvements by using the foreign key as primary key?

So reading this old blog post (which is quite old to be fair), gave me some thoughts on how to structure some of my table relationships.
If any secondary index look ups contain the primary key and the primary key is what accesses the row data, then for this sort of schema,
Users : id, name, country, etc.
User_Mailboxes : id, user_id, location, height, etc.
Where a user may or may not have a mailbox but at max one, would it be more efficient to get rid of the 'id' as the primary key of the user_mailboxes table and set the foreign key as the primary key?
From my understanding of InnoDB, that way we'd save any secondary index look ups for the corresponding primary key and be able to use the User.id directly to find the related mailbox info.
So something more akin to this,
Users : id (PRIMARY), name, country, etc.
User_Mailboxes : user_id(FOREIGN, PRIMARY), location, height, etc.
Should be slightly more performant in terms of index storage and random lookups? Especially if I'm considering grabbing a bunch of mailboxes at once based on some user criteria?
In a 1:0..1 relationship you can (and should) do that, yes. I would do so, too. Storing unneeded information isn't good anyway.
But don't be disappointed, when you don't gain as much performance as you think. When you don't grab a really large amount of users/mailboxes at once, you won't notice much or anything at all.
A FOREIGN KEY slows down inserts because of the check that is performed. (But it is a minor performance hit.)
Nothing "requires" the existence of any FKs. (Other than some textbooks.)
A FOREIGN KEY implicitly creates an INDEX (if one does not already exist).
INDEXes can make a big difference in the performance of SELECTs.
This trick can shift 'important' queries to the PRIMARY KEY. The trick makes fetching by range to the PK take advantage of clustering.
-- Instead of
PRIMARY KEY (id),
INDEX (foo)
-- change to
PRIMARY KEY (foo, id), -- `id` included to achieve UNIQUEness
INDEX (id) -- to keep AUTO_INCREMENT happy
If you have a 'natural' primary key, use it and jettison the auto_increment. (In some cases this helps overall; in some cases, it hurts.)

MySQL: Create a Foreign key without an Index

Is it possible to have a foreign key without an index in MySQL 5.6.34? I want that because I created a nullable column in 20M rows with a foreign key to another table. As this is a new feature, only the new rows MAY have this column filled with an actual value, and as you may expect, the cardinality of that index becomes horrible. So, for most of the time, using that index is actually a bad idea. The problem: I have tons of queries that shares this same restriction:
[...] from large_table where tenant_id = ? and nullable_foreign_key_with_index is null and [...]
The issue? MySQL thinks that it's a good idea to use an index_merge/intersect strategy for query resolution. In this case MySQL would do 2 queries in parallel: one with tenant_id (which uses a valid and good index) and another one with nullable_foreign_key_with_index which is bad, almost a "full table scan in parallel" given that the cardinality of this index is <1000 in a table with >20M rows. More details about this "problem" in here
So, what are the pssible solutions? Given that MySQL "forces" a foreign key to have an index attached:
Drop the foreign key and the index. This is bad, because in the case of a bug in the app we may compromise the referential integrity.
FOREIGN_KEY_CHECKS=0; Drop index; FOREIGN_KEY_CHECKS=1. This is bad, because even that the foreign key still exists, MySQL doesn't validade the column anymore to check if the value actually exists. Is that a bug?
Use query hints in all existing queries to make sure that we are only using the old and efficient "tenant_id_index". This is bad because I have to hunt down all existing queries and also remember to use it again when news queries are built.
So, how can I say: "MySQL, don't bother creating an index for this foreign key, but keep validating it's content in the related table, which is indexed by primary key anyway". Am I missing something? The best idea so far is to remove the foreign key and just believe that the app is working as expected, which probably it is, but this would start a classic discussion about having constraints in APP vs DATABASE. Any ideas?
For this query:
from large_table
where tenant_id = ? and
nullable_foreign_key_with_index is null and [...]
Just add the index large_table(tenant_id, nullable_foreign_key_with_index).
MySQL should use this index for the table.
I'm pretty sure you can do this backwards (I would be 100% sure if the comparison were to anything other than NULL, but I'm pretty sure MySQL does the right thing with NULL as well.)
large_table(nullable_foreign_key_with_index, tenant_id)
And MySQL will recognize that this index works for the foreign key and not create any other index.
Q: How can I say: "MySQL, don't bother creating an index for this foreign key, but keep validating it's content in the related table, which is indexed by primary key anyway"
A: No can do. InnoDB requires a suitable index to support the enforcement of foreign key constraint.
Consider the flip side of it... if we are going to DELETE a row in the parent table, then InnoDB needs to check the foreign key constraint.
That means InnoDB needs to check the contents of the child table, to find rows that have a specific value in foreign key column. Essentially equivalent to
SELECT ... FROM child_table c WHERE c.foreign_key_col = ?
And to do that, InnoDB requires that there be an index on child_table that has foreign_key_col as the leading column.
The options suggested in the question (disabling or dropping the foreign key) will work because then InnoDB isn't going enforce the foreign key. But as noted in the question, what this means is that the foreign key isn't enforced. Which defeats the purpose of the foreign key. The application code could be responsible for enforcing referential integrity, or we could write some ug-gghhh-ly triggers (no, we don't want to go there).
As Gordon already noted in his (as usual excellent) answer... the problem isn't really dropping the index on the foreign key column. The actual problem is the inefficient execution plan. And the most likely fix for that is to make sure a more suitable index is available.
Composite indexes are the way to go. An index like this:
... ON child_table (foreign_key_col,tenant_id,...)
would satisfy the requirement of the foreign key, an index with the foreign key column as a leading column. And drop the (now redundant) index on just the singleton foreign_key_col.
This index could also be used to satisfy the query that's using a horrible index merge access plan. (Verify with EXPLAIN.)
Also, consider adding columns (such as foreign_key_col) to the index that has tenant_id as the leading column
... ON child_table (tenant_id,...,foreign_key_col,...)
and drop the redundant index on the singleton tenant_id col.
Summary: Almost always it is better to have a composite index instead of depending on "index merge intersect".
If both columns are tested with = (or IS NULL), it does not matter which order the columns are in the index definition. That is, cardinality is irrelevant.

Is a primary key necessary? [duplicate]

This question already has answers here:
SQL Primary Key - is it necessary?
(5 answers)
Closed 7 years ago.
In database systems, should every table have a primary key?
For example I have a table table1(foreignkey1,foreignkey2,attribute) like this.table1 does not have a primary key.
Should I define a primary key for this table like table1id?
This is a subjective question, so I hope you don't mind me answering with some opinion :)
In the vast majority of tables I've made – I'm talking 95%+ – I've added a primary key, and been glad I did. This is either the most critical unique field in my table (think "social security number") or, more often than not, just an auto-incrementing number that allows me to quickly and easily refer to a field when querying.
This latter use is the most common, and it even has its own name: a "surrogate" or "synthetic" key. This is a value auto-generated by the database and not derived from your application data. If you want to add relations between your tables, this surrogate key is immediately helpful as a foreign key. As someone else answered, these keys are so common that MySQL likes to add one even if you don't, so I'd suggest that means the consensus is very heavily biased towards adding primary keys.
One other thing I like about primary keys is that they help convey your intent to others reading your table schemata and also to your DMBS: "this bit is how I intend to identify my rows uniquely, don't let me try to break that rule!"
To answer your question specifically: no, a primary key is not necessary. But realistically if you intend to store data in the table for any period of time beyond a few minutes, I would very strongly recommend you add one.
No, it is not required for every table to have a primary key. Whether or not a table should have a primary key is based on requirements of your database.
Even though this is allowed it is bad practice because it allows for one to add duplicate rows further preventing the unique identification of rows. Which contradicts the underline purposes of having a database.
I am a strong fan of synthetic primary keys. These are auto-incremented columns that uniquely identify each row.
These provide functionality such as:
Ability to see the order of insertion of rows. Which were inserted most recently?
Ability to create a foreign key relationship to the table. You might not need one now, but it might be useful in the future.
Ability to rename "data" columns without affecting other tables.
Presumably, for your table, you can define a primary key on (foreignkey1, foreighkey2). Composite primary keys are also sensible, but they are cumbersome for foreign key relationships and joins. And, when there are foreign key relationships, they may cause additional storage, because the composite key ends up being stored across multiple tables.
It's a good practise to have a primary key/composite primary key for a table:
it helps to join tables,
clustered tables will need primary key.
Database design should have primary key for a table.
In MySQL storage engine always creates a PRIMARY KEY if you didn't specify it explicitly, thus making an extra column you don't have access to.
You can create Composite Primary key like:
CREATE TABLE table1(
FK1 INT,
FK2 INT,
ATTRIBUTE INT,
PRIMARY KEY (FK1, FK2)
)
or create a constraint on table1:
ALTER TABLE table_name
ADD CONSTRAINT pk_table1 PRIMARY KEY (FK1,FK2)

MySQL Tables with Temp Data - Include a Primary Key?

I'm putting together a new database and I have a few tables that contain temp data.
e.g.: user requests to change password - a token is stored and then later removed.
Currently I have a primary key on these tables that will auto-increment from 1 upwards.
AUTO_INCREMENT = 1;
I don't really see any use for this primary key... I will never reference it and it will just get larger.
Should tables like this have a primary key or not?
Short answer: yes.
Long answer:
You need your table to be joinable on something If you want your table
to be clustered, you need some kind of a primary key. If your table
design does not need a primary key, rethink your design: most
probably, you are missing something. Why keep identical records? In
MySQL, the InnoDB storage engine always creates a PRIMARY KEY if you
didn't specify it explicitly, thus making an extra column you don't
have access to.
Note that a PRIMARY KEY can be composite.
If you have a many-to-many link table, you create the PRIMARY KEY on
all fields involved in the link. Thus you ensure that you don't have
two or more records describing one link.
Besides the logical consistency issues, most RDBMS engines will
benefit from including these fields in an UNIQUE index.
And since any PRIMARY KEY involves creating a UNIQUE index, you should
declare it and get both logical consistency and performance.
Here is a SO thread already have same discussion.
Some people still loves to go with your opinion. Have a look here
My personal opinion is that you should have primary keys, to identify or to make a row unique. The logic can be your program logic. Can be an auto-increment or composite or whatever it can be.

When we don't need a primary key for our table?

Will it ever happen that we design a table that doesn't need a primary key?
No.
The primary key does a lot of stuff behind-the-scenes, even if your application never uses it.
For example: clustering improves efficiency (because heap tables are a mess).
Not to mention, if ANYONE ever has to do something on your table that requires pulling a specific row and you don't have a primary key, you are the bad guy.
Yes.
If you have a table that will always be fetched completely, and is being referred-to by zero other tables, such as some kind of standalone settings or configuration table, then there is no point having a primary key, and the argument could be made by some that adding a PK in this situation would be a deception of the normal use of such a table.
It is rare, and probably when it is most often done it is done wrongly, but they do exist, and such instances can be valid.
Depends.
What is primary key / unique key?
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value (or combination of values) in those columns if NULL values are not used. Depending on its design, a table may have arbitrarily many unique keys but at most one primary key.
So, when you don't have to differentiate (uniquely identify) each row,
you don't have to use primary key
For example, a big table for logs,
without using primary key, you can have fairly smaller size of data and faster for insertion
Primary key not mandatory but it is not a good practice to create tables without primary key. DBMS creates auto-index on PK, but you can make a column unique and index it, e.g. user_name column in users table are usually made unique and indexed, so you may choose to skip PK here. But it is still a bad idea because PK can be used as foreign key for referential integrity.
In general, you should almost always have PK in a table unless you have very strong reason to justify not having a PK.
Link tables (in many to many relationship) may not have a primary key. But, I personally like to have PK in those tables as well.