Multiple column key or multiple column unique [duplicate] - mysql

Here's what's confusing me. I often have composite primary keys in database tables. The bad side of that approach is that I have pretty extra work when I delete or edit entries. However, I feel that this approach is in the spirit of database design.
On the other side, there are friends of mine, who never use composite keys, but rather introduce another 'id' column in a table, and all other keys are just FKs. They have much less work while coding delete and edit procedures. However, I do not know how they preserve uniqueness of data entries.
For example:
Way 1
create table ProxUsingDept (
fkProx int references Prox(ProxID) NOT NULL,
fkDept int references Department(DeptID) NOT NULL,
Value int,
PRIMARY KEY(fkProx,fkDept)
)
Way 2
create table ProxUsingDept (
ID int NOT NULL IDENTITY PRIMARY KEY
fkProx int references Prox(ProxID) NOT NULL,
fkDept int references Department(DeptID) NOT NULL,
Value int
)
Which way is better? What are the bad sides of using the 2nd approach? Any suggestions?

I personally prefer your 2nd approach (and would use it almost 100% of the time) - introduce a surrogate ID field.
Why?
makes life a lot easier for any tables referencing your table - the JOIN conditions are much simpler with just a single ID column (rather than 2, 3, or even more columns that you need to join on, all the time)
makes life a lot easier since any table referencing your table only needs to carry a single ID as foreign key field - not several columns from your compound key
makes life a lot easier since the database can handle the creation of unique ID column (using INT IDENTITY)
However, I do not know how they
preserve uniqueness of data entries.
Very simple: put a UNIQUE INDEX on the compound columns that you would otherwise use as your primary key!
CREATE UNIQUE INDEX UIX_WhateverNameYouWant
ON dbo.ProxUsingDept(fkProx, fkDept)
Now, your table guarantees there will never be a duplicate pair of (fkProx, fkDept) in your table - problem solved!

You ask the following questions:
However, I do not know how they
preserve uniqueness of data entries.
Uniqueness can be preserved by declaring a separate composite UNIQUE index on columns that would otherwise form the natural primary key.
Which way is better?
Different people have different opinions, sometimes strongly held. I think you will find that more people use surrogate integer keys (not that that makes it the "right" solution).
What are the bad sides of using the
2nd approach?
Here are some of the disadvantages to using a surrogate key:
You require an additional index to maintain the unique-ness of the natural primary key.
You sometimes require additional JOINs to when selecting data to get the results you want (this happens when you could satisfy the requirements of the query using only the columns in the composite natural key; in this case you can use the foreign key columns rather than JOINing back to the original table).

There are cases like M:N join tables where composite keys make most sense (and if the nature or the M:N link changes, you'll have to rework this table anyway).

I know it is a very long time since this post was made. But I had to come across a similar situation regarding the composite key so I am posting my thoughts.
Let's say we have two tables T1 and T2.
T1 has the columns C1 and C2.
T2 has the columns C1, C2 and C3
C1 and C2 are the composite primary keys for the table T1 and foreign keys for the table T2.
Let's assume we used a surrogate key for the Table T1 (T1_ID) and used that as a Foreign Key in table T2, if the values of C1 and C2 of the Table T1 changes, it is additional work to enforce the referential ingegrity constraint on the table T2 as we are looking only at the surrogate key from Table T1 whose value didn't change in Table T1. This could be one issue with second approach.

Related

How do I make a field reference another field which is not a primary key in MySQL?

I know that foreign keys need not reference only primary keys but they can also reference a field that has a unique constraint on it. For my scenario, I am setting up a quiz where for each test, I have a set of questions. My table design is like this
The point is, in my 2nd table where I will put all the answer options, I want the question number field to link to the first table question number. How do I do this? Or is there an alternative to this design?
Thank you
Ideally there should be a question_id primary key column in the test_question table, and you would use this as the foreign key in the test_answer table.
With your composite primary key in the test_question table, you should make a corresponding composite foreign key:
CONSTRAINT FOREIGN KEY (test_id, question_no) REFERENCES test_question (test_id, question_no)
This is in addition to the foreign key just for the test_id column.
Add another table purely for answers, and link them via the question_no field.
A DB table should hold information on one sort of item. Questions and answers are separate sorts of information so should be in separate tables. Adding a separate table also allows changes to questions and answers independently. Additionally, if they are separate, you could add a language field to each table and have a multi-lingual quiz
Short answer:
You can JOIN on any columns or expressions. There is no "requirement" for a FOREIGN KEY, PRIMARY KEY, UNIQUE, or anything else.
Long answer:
However,... For performance (in large tables), some things make a difference.
If you are JOINing to a PK, Unique key, or even an indexed column, the query cold run faster.
Why have a FOREIGN KEY? An FK is two things:
A "constraint" that says that the value must exist in the other table. Also, with things like ON DELETE CASCADE, it can provide actions to take if the indicated row is removed. The constraint requires looking in the other table each time a write occurs (eg INSERT).
An Index. That is, specifying a FK automatically adds an INDEX (if not already present) to make the constraint faster.
Getting the id
Here is the "usual" way to do a pair of inserts, where you need the second to 'point' to the first:
INSERT INTO t1 ... -- with an AUTO_INCREMENT id
grab LAST_INSERT_ID() -- that id
INSERT INTO t2 ... -- and include the id from above
For AUTO_INCREMENT to work it must be the first column of some key. (Note: a PRIMARY KEY is a UNIQUE is a key (aka INDEX).)
Optionally you can specify a FK on the second table to point out the connection between the tables.
And, as spelled out in other answers, a FK could involve more than one column.
Entities and Relations
Sometimes, a set of tables like yours is best 'designed' this way:
Determine the "entities": users, tests, questions, answers
Relations and whether they are 1:1, 1:many, or many:many... Users:test is many-to-many; tests:questions is 1:many (unless you want questions to be shared between tests).
Answers is more complex since each 1 answer depends on the user and question.
1:1 -- rarely practical; may as well merge the tables together.
1:many -- a link (FK?) in one table to the other.
many:many -- need a bridge table with (usually) 2 columns, namely ids linking to the two tables.

sql management studio [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Can I have too many columns in my composite primary key on one table

I have a table that uses 2 foreign key fields and a date field.
Is it common to have a table use 3 or more fields as a primary key? And are there any disadvantages to doing this?
--
My 3 tables are employees, training, and emp_training. The employees table holds employee data. Training table holds different training courses. And I am designing the emp_training table to be the fields EmployeeID (FK), TrainingID (FK), OnDate.
An employee can do multiple training courses, and can do the same training course multiple times. But they cannot to the same training course more than once on the same day.
Which is better to implement:
Option A - Make all 3 fields a primary key
Option B - Add an autonumber PK field, and use a query to find any potential duplicates.
I've created many tables before using 2 fields as a primary key, but never 3, so I'm curious if there is any disadvantage to proceeding with option A
It's worth to mention, that with SQL Server the PK by default is the one and only clustered key, but you are allowed to create a non-clustered PK as well.
You may define a new clustered index which is not the PK. "Primary Key" is just a name actually...
The most important question is: Which columns participate in a clustered key and (this is the very most important question): Do they have an implicit sorting? And (very important too): Are there many update operations which change the content of participating columns?
You must be aware, that a clustered key defines the physical order on your hard disc. In other words: The clustered key is the table itself. You can think of an index with all columns included. If your leading column (worst case) is a GUID, each insert to your table will not be in order. This leads to a 99.99% fragmentation.
If a clustered index is bound to the time of insert or a running number (best case), it will never go into fragmentation!
What makes things worse: If there is a clustered key (whether it's called PK or not), it will be used as lookup key for other indexes.
So: in many cases it is best to use a running number as clustered key and a non-clustered multi-column index which is much faster to re-build than as if it was the clustered one.
All indexes will profit from this!
My advise for you:
Option C: a running number as PK and additionally a unique multi-column-key to ensure data integrity. No need to use own logic here...
Yes, you can have a poor strategy for choosing too many columns for your composite Primary Key (PK) if a better strategy could be employeed for uniqueness via secondary indexes.
Remember that the PK is special. There is only 1 physical / clustered ordering of your data. Changes to the data via Inserts and Updates (and incumbent shuffling) has overhead there that would not exist if maintained in a secondary index.
So the following can have not-so-insignificant differences:
A primary key with 5 composite columns
vs.
A primary key with 1 or 2 columns plus
Secondary indexes that maintain uniqueness if thought through well
The former mandates movement of data between data pages to maintain the clustered index (the PK). Which might suggest why so often one sees:
(
id int auto_increment primary key,
...
)
in table designs.
Performance with Index Width:
The width of the PK in 1. above is narrow. The width of 2. can be quite wide. Wider keys propagating to child relationships will slow performance and concurrency.
Cases of FK compositions:
Special cases of compositions of foreign keys simply cannot be achieved without the use of a single column index, preferably the PK, as seen in this recent Answer of mine.
I dont think that there is any problem of creating a table with a composed PK ,such tables are needed in larger db .There is not a real problem in creating a table with 2FK whose with the OnDate field form the PK . Both ways are vailable.
Good luck!
If you assign primary key on more than one column it will be composite primary key. For example,
CREATE TABLE employee(
training VARCHAR(10),
emp_training VARCHAR (20),
OnDate INTEGER,
PRIMARY KEY (training, emp_training, OnDate)
)
there will be unique records in training, emp_training, OnDate together and can not be null together.
As already stated you can have a single primary key which consists of multiple columns.
If the question was how to make the columns primary keys separately, that's not possible. However, you can create 1 primary key and add two unique keys

Does a join table or associated table need a primary key?

Sorry, I'm pretty new at this,
I have 3 tables, one is table1, table2, and table12.
table1 has a PK table1_id and table2 has table2_id as a PK.
table12 has 3 attributes, FK table1_id, FK table2_id, and table12_name.
Is it wrong that I don't have a table12_id?
Thanks and sorry for the dumb post...
You probably should have an explicit primary key on table12. The question is, which of these makes a better primary key:
An artificial auto-incremented primary key, say Table12Id.
The pair (table1_id, table2_id)
Note that the second of these imposes a uniqueness constraint on the pair, which you probably want (if you allow duplicates, then you should definitely have an explicit id).
I am someone who strongly advocates using numeric, auto-incremented primary keys on all tables. However, for a junction table either method is fine. There is logic to this reasoning. All tables that represent entities should have unique keys. This table is an implementation of a relationship, so the composite primary key makes sense.
Note that depending on how you use the table, you might still want indexes on either or both of the components of foreign key columns.
Not necessarily wrong.
You should use a primary key, if you want to reference a specific row in a table.
The primary key must be a unique value identifying the row.
Sometimes you do not need a primary key, as in your case,
where the table12 seems to be a table that connects the other two
and names that connection.
You might want to make the connection unique (the pair of ID values in table12).
Following the structures of your 3 tables, it is clear you had a ManyToMany relationship between table1 and table2, so that is why you created an associative third table table12.
Actually it is just great what you did: table12 will have a primary key which is combined of table2_id and table1_id whereas table12_name is optional and depends on your needs.
So to answer your question directly: you already have table12_id which is the combination of table1_id and table2_id that must be, in combination -optionally with table12_name- a primary key of table12.

RDBMS best practices - autoid for association table?

I have two tables, let's say they are called table A and table B. An item from table B can be present in multiple instances of A, and each A can contain multiple Bs so I have a table called a_b which links them together by their primary keys. My question is when I define this association table, should I have a primary key on the association table? Or is it not needed? Just trying to avoid ending up on TDWTF, that's all :)
The primary key would be on the table A PK column and table B PK column in your association table. That way, you ensure you don't get any duplicate rows in your association table by accident.
One of the main purposes of primary keys is to guarantee referential integrity. That is, keep the data in your table clean, with no duplicates. The PK in this case will ensure you never have 2 duplicate rows in the association table.
I think you might want to use a primary key in order to show your intent. If for example you do not want
a, b
a, b
Then a primary key defined on A.a and B.b would make that more clear. If you don't care, but you have a,b and other fields, then adding a surrogate key as your primary key might help in giving you a uniform way to delete a row that you do not want. Otherwise you will have to delete where a=a and b=b and ?? then pick some field value from the row you want deleted. Whereas with a surrogate key you can just pick the row and say delete where mykey = 36 or something...
But really it depends on the business case. Many intersect tables have some kind of date range, or additional fields related to the relationship in addition to the keys of the two tables. Defining a primary key on the existing columns, a new surrogate key, some unique indexes, some constraints, or even having no indexes could all be valid courses of action depending upon your needs.
I would say definitely do whatever makes your intentions the most clear.
Not needed. Both keys should form the primary key of your association table. If you're going to be doing bidirectional navigation, consider adding an index with the keys reversed.
The primary key is needed always.
However, I'd say it depends what should it be. If you are going to use some sort of ORM systems (e.g. Hibernate) then it is best to have a surrogate identifier, while those two foreign keys (pointing to tables A and B) should form a unique index.
Also, if there would ever be a need to reference such a relationship from another table then this surrogate identifier would be really handy.