Please confirm my use of primary key and unique index - mysql

I think I understand primary keys and indexes.
In my setup, I have a table with several columns. Two of these columns are User ID, and Username.
Ideally I would like both to be unique, and non nullable.
As far as I can tell, my best use would be to have the User ID as the primary key, as this is the most important field not to NULL, and it will never change as the database grows.
I would then have to have the username column as a unique index, so that it can be the same on another row, although unfortunately, could end up NULL.
This is what I will do unless there is a way to have both columns as unique and non NULLABLE?

You can declare the Username column as NOT NULL and put an unique index on it. Although the index itself won't force not-null values, the field definition will, so it will be effectively a unique non-nullable field.

From both my application development and datawarehouse experience I would recommend having a separate primary key that is not used in any business setting and do not use User ID as the primary key. Using UserID as the primary key can lead to a whole host of problems. I would index each column (separately).
Anytime you need to merge or reassign a user or change their ID, etc, having actually used their userID as the primary key will lead to a lot of problems for those operations.
Also, on the web, this will open up people seeing URL's like ....user/1/details and then potentially being able to change the '1' to a '2' (for example) and seeing other peoples info. It is better if the ID is unique like '57489574389ghfjghfjghf' and then it's harder to hack URLs with.
The choice between a 'natural' and a 'surrogate' key is explained well here:
http://www.agiledata.org/essays/keys.html
Most of the problems people experience in this area are for edge cases such as merges and deletes. These are usually of low priority initially but concern over them will grow over time and poorly engineered solutions will start to break down (usually because at the point that data quality is 'recognized' there is often such a large volume of 'bad' data that going forward is untenable - the old data can't be 'fixed' and without that rules are hard to introduce for new records which will co-exist with them. This assumes that the ability to update old records is still required.

Nop, sorry to say you are incorrect, on both accounts.
1) Right about everything, except that the PK can change if you want it to.
2) Unique index is, by definition, unique, it cannot be repeated. What you mean is a plain old index, not unique, which can be repeated. Its purpose is to speed up querying if you filter often by that field. Otherwise is better not to use it.
What you want: Column1 = Primary Key (not null), Column2 = Unique Index (not null), exactly what you said, but now you know why it does work as you need it to.
EDIT: Also, it seems you make a corelation between indexes and non-nullables. You can make a column non-nullable, independently of whether it is an index or not.

Totally agree with Michael, your primary key column should not contain any meaningful data, especially like userID. So you should add another column for the PK and fill it from a sequence.
Also agree with Darhazer: you should put a not null constraint and a unique index on both the userid and username fields.

Related

Table without a primary key

So I've always been told that it's absolutely necessary to have a primary key specified with a table. I've been doing some work and ran into a situation where a primary key's unique constraint would stop data I need from being added.
If there's an example situation where a table was structured with fields:
Age, First Name, Last Name, Country, Race, Gender
Where if a TON of data was being entered all these fields don't necessarily uniquely identify a row and I don't need an index across all columns anyways. Would the only solution here be to make an auto-incrementing ID field? Would it be okay to NOT have a primary at all?
It's not always necessary to have a primary key, most DBMS' will allow you to construct a table without one (a).
But that doesn't necessarily mean it's a good idea. Have a think about the situation in which you want to use that data. Now think about if you have two twenty-year-old Australian men named Bob Smith, both from Perth.
Without a unique constraint, you can put both rows into the table but her's the rub. How would you figure out which one you want to use in future? (b)
Now, if you just want to store the fact that there are one or more people meeting those criteria, you only need to store one row. But then, you'd probably have a composite primary key consisting of all columns.
If you have other information you want to store about the person (e.g., highest score in the "2048" game on their iPhone), then you don't want a primary key across the entire row, just across the columns you mention.
Unfortunately, that means there will undoubtedly come a time when both of those Bob Smith's try to write their high score to the database, only to find one of them loses their information.
If you want them both in the table and still want to allow for the possibility outlined above (two people with identical attributes in the columns you mention) then the best bet is to introduce an artificial key such as an auto-incrementing column, for the primary key. That will allow you to uniquely identify a row regardless of how identical the other columns are.
The other advantage of an artificial key is that, being arbitrary, it never needs to change for the thing being identified. In your example, if you use age, names, nationality or location (c) in your primary key, these are all subject to change, meaning that you will need to adjust any foreign keys referencing those rows. If the tables referencing these rows uses the unchanging artificial key, that will never be a problem.
(a) There are situations where a primary key doesn't really give you any performance benefit such as when the table is particularly small (such as mapping integers 1 through 12 to month name).
In other words, things where a full table scan isn't really any slower than indexing. But these situations are incredibly rare and I'd probably still use a key because it's more consistent (especially since the use of a key tends not to make a difference to the performance either way).
(b) Keep in mind that we're talking in terms of practice here rather than theory. While in practice you may create a table with no primary key, relational theory states that each row must be uniquely identifiable, otherwise relations are impossible to maintain.
C.J. Date who, along with Codd, is one of the progenitors of relational database theory, states the rules of relational tables in "An introduction to Database Systems", one of which is:
The records have a unique identifier field or field combination called the primary key.
So, in terms of relational theory, each table must have a primary key, even though it's not always required in practice.
(c) Particularly age which is guaranteed to change annually until you're dead, so perhaps date of birth may be a better choice for that column.
Would the only solution here be to make an auto-incrementing ID field?
That is a valid way, but it is not the only one: you could use other ways to generate unique keys, such as using GUIDs. Keys like that are called surrogate primary keys, because they are not related to the "payload" of the data row.
Would it be okay to NOT have a primary at all?
Since you mentioned that the actual data in rows may not be unique, you wouldn't be able to use your table effectively without a primary key. For example, you would not be able to update or delete a specific row, which may be required, for example, when a user's name changes.
The most simple solution would be to include an ID column to serve as primary key:
id int not null primary key auto_increment
From your post it looks like the table representing a person entity. In that case, wouldn't having a PK would determine each person entity uniquely. I would suggest, having a primary key on the table which will uniquely determine each person record.
You can either create a AUTO_INCREMENT ID column (a synthetic ID column)
(OR)
You can combine multiple columns in your table which can uniquely determine all the other fields like (First Name, Last Name) probably which will make it a composite primary key but that may clash as well since there could be more than one person having same full name (first name + last name).
Typically you should avoid proliferating ID primary keys fields through your database.
Now, that doesn't mean you shouldn't have primary keys, your primary key can be a surrogate or a composed key. And that's what you should do here.
If those fields {Age, First Name, Last Name, Country, Race, Gender}, identify unequivocally each row, then make a primary key composed by all of those fields.
But if not, then you must have some other type of information to disambiguate your data.
You can also, not specify any kind of key, and assume that table as non-normalized, and redundant data source... if this is what you need...!
Use an identity column with another column such as Last Name

Why we should have an ID column in the table of users?

It's obvious that we already have another unique information about each user, and that is username. Then, why we need another unique thing for each user? Why should we also have an id for each user? What would happen if we omit the id column?
Even if your username is unique, there are few advantages to having an extra id column instead of using the varchar as your primary key.
Some people prefer to use an integer column as the primary key, to serve as a surrogate key that never needs to change, even if other columns are subject to change. Although there's nothing preventing a natural primary key from being changeable too, you'd have to use cascading foreign key constraints to ensure that the foreign keys in related tables are updated in sync with any such change.
The primary key being a 32-bit integer instead of a varchar can save space. The choice between a int or a varchar foreign key column in every other table that references your user table can be a good reason.
Inserting to the primary key index is a little bit more efficient if you add new rows to the end of the index, compared to of wedging them into the middle of the index. Indexes in MySQL tables are usually B+Tree data structures, and you can study these to understand how they perform.
Some application frameworks prefer the convention that every table in your database has a primary key column called id, instead of using natural keys or compound keys. Following such conventions can make certain programming tasks simpler.
None of these issues are deal-breakers. And there are also advantages to using natural keys:
If you look up rows by username more often than you search by id, it can be better to choose the username as the primary key, and take advantage of the index-organized storage of InnoDB. Make your primary lookup column be the primary key, if possible, because primary key lookups are more efficient in InnoDB (you should be using InnoDB in MySQL).
As you noticed, if you already have a unique constraint on username, it seems a waste of storage to keep an extra id column you don't need.
Using a natural key means that foreign keys contain a human-readable value, instead of an arbitrary integer id. This allows queries to use the foreign key value without having to join back to the parent table for the "real" value.
The point is that there's no rule that covers 100% of cases. I often recommend that you should keep your options open, and use natural keys, compound keys, and surrogate keys even in a single database.
I cover some issues of surrogate keys in the chapter "ID Required" in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
This identifier is known as a Surrogate Key. The page I linked lists both the advantages and disadvantages.
In practice, I have found them to be advantageous because even superkey data can change over time (i.e. a user's email address may change and thus any corresponding relations must change), but a surrogate key never needs to change for the data it identifies because its value is meaningless to the relation.
It's also nice from a JOIN standpoint because it can be an integer with a smaller key length than a varchar.
I can say that in practice I prefer to use them. I have been bitten too many times by having multiple-column primary keys or a data-representative superkey used across tables having to become non-unique later due to changing requirements during development, and that is not a situation you want to deal with.
In my opinion, every table should have a unique, auto-incremented id.
Here are some practical reasons. If you have duplicate rows, you can readily determine which row to delete. If you want to know the order that rows were inserted, you have that information in the id. As for users, there's more than on "John Smith" in the world. An id provides a key for foreign references.
Finally, just about anything that might describe a user -- a name, an address, a telephone number, an email address -- could change over time.
im mysql we have.
1:Index fields 2:Unique fields and 3:PK fields.
index means pointable
unique means in a table must be one in all rows.
PK = index + unique
in a table you may have lots of unique fields like
username or passport code or email.
but you need a field like ID. that is both unique and index (=PK).which is first is always one thing and never changes and second is unique and third is simple (because is often number).
One reason to have a numeric id is that creating an index on it is leaner than on a text-field, reducing index size and processing time required to look up a specific user. Also it's less bytes to save when cross-referencing to a user (relational database) in a different table.

mysql auto increment primary key running out

I maintain a table with an ID AUTO INCREMENT PRIMARY KEY. When I delete an entry and re-add one, the new entry does not take the ID of the previous one instead it increments again by one. Is that normal and is it advised to not change this behavior? I just have a feeling this is creating a non scalable system as eventually it could run out of indexes.
This is by design, million of databases have primary keys like these with an integer key.
If you delete 90% of your inserts, you will run out of keys after 400 million rows1)
If and when you do you can do an
ALTER TABLE `test`.`table1` MODIFY COLUMN `item_id` BIGINT UNSIGNED NOT NULL
, ROW_FORMAT = DYNAMIC;
where column item_id would be your primary key.
After that you'll never have to worry about running out of key-space again.
Don't be tempted to start out with a bigint primary key!
It will make all your queries slower.
It will make the tables bigger.
On InnoDB the primary key is included on every secondary index, making a small primary key much faster with inserts.
For most tables you will never need it.
If you know your big table will have more rows than the integer can hold, than by all means make it a bigint, but you should only do this for tables that really need it. Especially on InnoDB tables.
Don't use a GUID, it's just a lot of wasted space, slowing everything way down for no reason 99,99% of the time.
1) using a unsigned! integer as primary key.
User niceguy07 uploads a picture of his kitten. The picture is saved as 000012334.jpg because you use primary keys as filenames instead of putting untrusted user data into them (which is a good idea).
niceguy07 sends a link with ?picture_id=12334 to his date.
niceguy07 deletes his kitten pictures and user fatperv08 uploads a picture of himself wearing only a batman mask.
Your database reuses primary keys, so unfortulately now the link with ?picture_id=12334 points to a picture of a naked fat perv wearing a batman mask.
Re-using primary key values of deleted records is an extremely bad idea. It is, in fact, a bug, if the primary key leaks out of the database because you use it in :
a URL
a filename
dumped along with other data in a file
etc
Since it is, in fact, very useful to do all of the above, not reusing primary key ids is a good idea...
It's fine. Depending on how many records you expect, you may want to make sure it's a bigint type, but int should be fine in most cases.
It is normal. Don't worry about ID's being sequential.
It depends on the number of records you plan on having and how often they are deleted and added, but if it's going to be an issue, use a bigint primary key.
There are also other options, such as using a GUID, if you are truely worried about running out of rows, but I've never run into a situation where I actually needed a bigint, I just occasionally use them on volitle tables to be safe.
A primary key in database design should be seen as a unique identifier of the info being represented. By removing an ID from a table you are essentially saying that this record shall be no more. If something was to take its place you are saying that the record was brought back to life. Now technically when you removed it in the first place you should have removed all foreign key references as well. One may be tempted at this point to say well since all traces are gone than there is no reason something shouldn't take it's place. Well, what about backups? Lets say you removed the record by accident, and it ended up replaced by something else, you would not be able to easily restore that record. This could also potentially cause problems with rollbacks. So not only is reusing ID's not necessary but fundamentally wrong in the theory of database design.
adding to dykstrad's great comment
If for example the primary key points to a point of sale price as an example, say a diet coke. Sloppy house keeping you delete the old price and reinsert the new price. Good housekeeping would dictate that you would do an update on the item which preserves the key / referential relationships as it is still the same item / unique relationship

Is it necessary to have a primary key ID with Auto Increment if I have a UNIQUE field INT?

I want to store in a table a list of IP ADDRESS to check later if some IP is already used in my system.
I want to store the ip in longip mode (signed int). And since each IP is unique i want to know if is necessary to have a primery key field (id, with autoincrement) or if its okey (and better) to just use my longip field as primary key.
if in the future you have to use the key for joining it with another table, the other table should contain all the number, and that's a lot of space wasted.
for example, you have a "computer" table
in that table, you have computes with ip's. For joining you need a key right? so, if you join by key, you should have the computer id and the key (in this case the ip)
I higly recommend to use a simpler id with autoincrement, like it's beeing done since mainframe (as400), iSeries, etc.
I think Marc_s' answer to the question When not to use surrogate primary keys? can guide us
I would say the following criteria must
be met:
your natural key must be absolutely, positively,
no-exceptions-allowed, unique
(things like names, social security
numbers etc. usually seem to be unique - but really aren't)
your natural key should be as small as an INT, e.g. not significantly more
than 4 bytes in size (don't use a
VARCHAR(50) for your PK, and
especially not for your clustering key
in SQL Server!)
your natural key ought to be stable, e.g. never change (OK, with ISO
country codes, this is almost a given - except when countries like Yugoslavia or the USSR collapse, or other like the two Germanies unite - but that's rare enough)
If those conditions are met, you can
consider a natural key as your PK -
but that should be the 2% exception in
all your tables - not the norm.
So I would say you should probably use a surrogate primary key. You can always use IP as a unique key if you want to
Since you're using the longip which as you pointed out is probably ok to use it.
Almost every time you start out with a natural key, you will end up regretting it later. Something will happen happen, its Murphy's law. Spare yourself the trouble, just add the ID column.
If its the auto-increment you don't like just use a uuid. MySql has a uuid function to make that easy.

Mysql auto increment primary key id's

I have some mysql tables that have auto incrementing id's that are primary keys, but I notice that I never actually use them... I used to think that every table must have a primary key so I guess that is why I created them before. Should I remove them all if I don't use them at all?
Unless you are running into space problems I wouldn't remove them.
They are a life saver in case you by mistake (or oversight) populate the database with repeated/wrong data.
They also help to have related tables, where you reference the content on one table through the autogenerated id.
This is assuming you have indexes for the other columns you use to actually query the data (if you don't, then more reason to keep the autoincrement ids and use them!).
No.
You should keep them; a database always needs something that differentiates a row from another row (a "Key" of some sort).
If you have something that is guaranteed to be unique for each row, then you can use that as a key; otherwise keep the Primary Key and the Auto generated ID.
I'd personally keep them. They will be especially useful at a later date if you expand the database design and need to reference this table.
Interesting!...
I seem to hold a minority opinion here, getting both upvoted and downvoted to currently an even 0, yet no one in the majority opinion (see responses above) seems to make much of a case for keeping the id field, and the downvoters didn't even bother leaving comments hinting at why doing away with the id is such a bad idea.
In their defense, my own original response did not include any strong argument as to why it is ok to do away with the id attribute in some cases (which seem to apply to the OP). Maybe such a gratuitous response makes it, in of itself, a downvotable response.
Please do educate me, and the OP, by leaving comments pro or against the _systematic_ (and I stress "systematic") need to include auto-incremented non-semantic primary keys in all tables. A promised I returned and added to my response to provide a list of reasons why it may be detrimental to [again, systematically] impose a auto-incremented PK.
My original response:
You bet! you can remove these!
Before you do anything to the database make sure you have a backup, in particular is the DB size is significant.
Use the ALTER TABLE statement to remove the id in the tables where you want to remove it. Specifically
ALTER TABLE myTable DROP COLUMN id
(you also need to remove the PK constraint before removing the id, if the table has such a constraint)
EDIT (Added later)
There are many cases where it just doesn't make sense to carry along an autoincremented ID key, regardless of the relative little extra storage requirement these keys add.
In all these cases, the underlying implication is that
either the data itself supplies a primary key,
or, the application manages the key generation
The key supplied "natively" in the data doesn't necessarily neeeds to be a single column key, it can be a composite key, although in these cases one may wish to study the situation more closely, particularly is the overal key is a bit long.
Here are some of the drawbacks of using an auto-incremeted primary key in lieu of a native or application-supplied key:
The effective data integrity may go unchecked
i.e. the server may allow record insertions of updates which create a duplicated [native] key (eventhough the artificial, autoincremented primary key hides this reality)
When relying on the auto-incremented PK for the support of joins between tables, when part of the [native] key values have to be updated...
...we either create the need of deleting the record in full and and re-insert it with the news values,
...or the risk of keeping outdated/incorrect links.
A common "follow-up" with auto-incremented keys is to create a clustered index on the table for this key.
This does make sense for tables without an native or application-supplied primary key, so so much for data sets that have such keys.
Effectively this prevents choosing a key for the clustered index which may be more beneficial for the most common query patterns.
Migrating tables with an auto-incremented key can made more difficult depending on the DBMS (need to declare the underlying column as plain integer, prior to copy, then need start again the autoincrement...)
For narrow tables, i.e. tables with a few columns only, the relative cost of the auto-incremented PK can be significant, and impact performance in a non negligible fashion.
When inserting new records along with associated records in related tables, the auto-incremented key needs to be obtained after the insertion of the main record, before the related records can be inserted; the logic is simpler when the column values supporting the link are known ahead of time.
To summarize, the idea that so long as the storage can carry the [relatively minimal] extra "weight" of the artificial primary key, we should include and use such a key, is not without drawbacks of its own.
A final consideration is that just like it is rather easy to remove such keys when we don't need them, they too can be easily added, post-facto, when/if it becomes apparent that they are useful in a particular situation. Neither form of refactoring (adding vs. removing the auto-incremented columns) is risk free, but neither is a major production either.
Yes, if you can figure out another primary key.
There is obviously a flaw of your table design. For example, you had a table like
relation_id(PK), parent_id, child_id .
It is known that the combination of parent_id and child_id is unique, then you can assign the primary key to be parent_id + child_id, and then drop the column relation_id.
There should may endlessly other possible cases, but just bear in mind that primary key is helping you to locate data quickly, as well as helping you have your design making sense.