(MySQL) Why ID column should be set to Primary Key (being unique)? - mysql

I've heard Primary Key means to be unique. Correct me please if I'm wrong.
Assume we have a table of users. It has 3 columns of id, username and password. We usually set the id to be AUTO_INCREMENT. So it would technically make a new unique id each time we add a row to the table. Then, why we also set the id column to be Primary Key or Unique?

Having a column as a key offers other aspects. First, if it is primary or unique, this would enforce that no query could enter a duplicate value for that key. Also keys can allow you do things like
INSERT ... ON DUPLICATE KEY UPDATE...
Of course you also want an index on the column for quick lookups.

AUTO_INCREMENT behavior only manifests when the column is not specified during an insert. Consider:
CREATE TABLE ai (
ai int unsigned not null auto_increment,
oi int unsigned,
key (ai),
primary key (oi)
);
INSERT INTO ai VALUES (1,2);
INSERT INTO ai VALUES (1,3);
INSERT INTO ai VALUES (null,5);
This will yield (1,2), (1,3), (2,5). Note how the AUTO_INCREMENT column has a duplicate.

A primary key does two things:
enforce database integrity (uniqueness and not-null of the column)
create an index to implement that, which also makes for fast look-up by the primary key column as a "side-effect".
You may not strictly need (1) if you can ensure that in your application code (for example by only using the auto-increment value), but it does not hurt.
You almost certainly want (2), though.
So it would technically make a new unique id each time we add a row to the table
Well, that is up to you. The unique id only gets inserted if you don't specify an explicit value. And technically, it is not guaranteed to be unique, it is just an auto-increment that does not take into consideration any existing values in the table (that may have somehow ended up in there).

Related

Inserting new data in mysql after creating new field

How can I insert new data in column after adding column without using update function. for example
"alter table Employee add column Gender varchar(1) after Birthdate then I get wrong when I used this statement insert into Employee(ENumber,EmpName,Birthdate,Address,Salary,DNumber,Gender)
-> values
-> ('E001','GSInocencio','1988-01-15','Munoz',18000,'D005','F'),
It gives me error Duplicate entry 'E001' for key 'PRIMARY'
MariaDB [Employees_Valdez]>
The messages is pretty clear: You already have an employee with that ENumber value.
You have a UNIQUE constraint on that column, it's a PRIMARY KEY, so either pick a different value, or use a different primary key.
One thing to note is MySQL doesn't use complex string primary keys very efficiently, they're also a real hassle for relating data since they're so big. It's usually better to include a standard id INT AUTO_INCREMENT PRIMARY KEY column and then have things like ENumber being a secondary UNIQUE constraint.
You can then relate data using the 4-byte id value, or 8-byte if BIGINT is a concern like you might have two billion employees.

How to prevent duplicate row insert in MemSQL?

I have AUTO_INCREMENT PRIMARY KEY and another column that I can't set UNIQUE because unlike standard RDBMS like MySQL or PostgreSQL, MemSQL only allow only one of them, not both.
Is there workaround to prevent duplicate rows without sacrificing the auto_increment column?
I can use unique as primary key and use atomic counter in other product/service like Redis/atomic variable, but when I need to update the unique column I have to delete it first then reinsert, which is bad/unpreferred way for me..
MemSQL does support multiple unique keys together with a primary key. However, MemSQL requires that the columns in each unique or primary key must be a superset of the columns in the shard key - i.e. that all values that would be considered duplicate under each unique key have the same shard key, so that they get mapped to the same partition. This further implies that all the unique/primary keys must share at least one column in common.
For your case, it is not possible to have both a unique/primary key on the autoincrement column and the other column. But you can have a unique/primary key on the other column, without a unique key on the auto_increment column - just define it as a non-unique key. The automatically generated values will still be unique. Do note that then the table won't be able to enforce uniqueness if you manually insert values that are duplicate with other auto_increment values.

sql management studio [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Primary Key in SQL - Default value

I am working from this database, its one of the first I have tried building:
http://sqlfiddle.com/#!2/38ef8
When I try to add this line:
Insert Into country (name) values ('US');
It says Field 'id' doesn't have a default value. Am I doing my primary key correctly? I have seen people using "auto_incrment" on their primary key like this example:
http://sqlfiddle.com/#!2/c807a/2
Is that what I should be using?
If you didn't specify PRIMARY KEY column as AUTO_INCREMENT then you have to give values manually, for example:
INSERT INTO Country(id, name) values(1, 'US');
It's up to you wheter use AUTO_INCREMENT or not. There are many reasons to do it and many not to do it:
Pros and Cons of autoincrement keys on "every table"
Should each and every table have a primary key?
there are the properties of PRIMARY key
1 : cant be NULL
2 : cant be duplicate
now when you select AUTO_INCREMENT , every time you use the query
Insert Into country (name) values ('US');
it automatically generates a number incrementing the highest value existing in the table for the primary key column
but when you do not set the primary key as AUTO_INCREMENT ,
Insert Into country (name) values ('US');
this query will enter NULL values in every column for the row except the given column
in that case your PRIMARY_KEY also gets a null value
which clearly contradicts with the definition of PRIMARY_KEY .
that is why you get the error
I hope the explanation serves
If you have not set your primary key as auto increment, you will have to insert that manually in your queries.
The primary key should be set to AUTO_INCREMENT, if it is not so, you will have to set that manually.
Although you can still insert with specific values after setting the primary key to AUTO_INCREMENT provided the key is not already existing :D

Maintaining a large table of unique values in MySQL

This is probably a common situation, but I couldn't find a specific answer on SO or Google.
I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no duplicate rows. The table stores the user's uids. The SQL for the table is:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT)
The way the table works is that each user has around 1000 or so "possible friends" that are discovered and need to be stored, but duplicate "possible friends" need to be avoided.
The problem is, due to the design of the program, over the course of a day, I need to add 1 million rows or more to the table that may or not be duplicate row entries. The simple answer would seem to be to check each row to see if it is a duplicate, and if not, then insert it into the table. But this technique will probably get very slow as the table size increases to 100 million rows, 1 billion rows or higher (which I expect it to soon).
What is the best (i.e. fastest) way to maintain this unique table?
I don't need to have a table with only unique values always on hand. I just need it once-a-day for batch jobs. In this case, should I create a separate table that just inserts all the possible rows (containing duplicate rows and all), and then at the end of the day, create a second table that calculates all the unique rows in the first table?
If not, what is the best way for this table long-term?
(If indexes are the best long-term solution, please tell me which indexes to use)
Add a unique index on (user, possiblefriend) then use one of:
INSERT ... ON DUPLICATE KEY UPDATE ...
INSERT IGNORE
REPLACE
to ensure that you don't get errors when you try to insert a duplicate row.
You might also want to consider if you can drop your auto-incrementing primary key and use (user, possiblefriend) as the primary key. This will decrease the size of your table and also the primary key will function as the index, saving you from having to create an extra index.
See also:
“INSERT IGNORE” vs “INSERT … ON DUPLICATE KEY UPDATE”
A unique index will let you be sure that the field is indeed unique, you can add a unique index like so:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))
This will also speec up your table access significantly.
Your other issue with the mass insert is a little more tricky, you could use the in-built ON DUPLICATE KEY UPDATE function below:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;